Crawly Crypt Collection 1

home *** CD-ROM | disk | FTP | other *** search

/ Crawly Crypt Collection 1 / crawlyvol1.bin / program / books / 68k_book / arp_doc / chap_05.doc < prev next >

Wrap

Text File | 1985-11-20 | 114KB | 2,575 lines

Atari ST Machine Specific Programming In Assembly Chapter 5: Performance Testing The Never Finished Theory In a recent magazine article, the author stated that no program is ever finished. I have seen that viewpoint expressed many times. Sympathetically, I agree in principle with the emotional flavor of this statement, I suppose, but when I am working on a program, I always reach a point at which I can conclude that its performance is satisfactory. At that stage I say that the program is finished. The reason that some programmers, and perhaps some users also, come to accept the never finished concept as gospel is that they have seen too many programs, either purchased or written, that never seem to perform completely satisfactorily, and, therefore, seem to continuously require fine tuning or corrections. But this program attribute is not an inherent consequence of program development. The problem with such programs is that their performance was judged to be satisfactory prematurely. Too often, the performance of a program is judged to be satisfactory by its author if the program seems to accomplish its primary function after a few cursory tests. Program testing, like program documentation, seems to be a distasteful chore to many programmers. That's probably why so many programs are thrust into the software market prematurely. The attitude that I have developed is one which views algorithmic design, documentation and testing as steps in a single process, each of which demands the same level of concentration, concern and quality control. If you can adopt a similar attitude, I guarantee that you will be a happier, more successful programmer than one who finds any phase of program development boring or distasteful. Documentation is your front line defense against programming catastrophes. To be able to fix a program, at repair time, you must be able to understand it as well as you did when you wrote it. The same level of understanding is required when you decide to intentionally enhance a program. If program documentation includes the results of performance testing, then a program's prior performance can be used to gauge performance after alterations. When a program seems to be malfunctioning, the first action you take should be to compare its current performance to past performance under known conditions. Many times, such comparisons will reveal that the execution environment, not the program, is at fault. Of course, it is then that you may decide that a new version of the program is required to cope with an altered execution environment. The Three For One Theory When I was working on a large mainframe, the manufacturer to remain nameless, for a company that shall also remain nameless, we programmers developed a formula for bug introduction into the mainframe's operating system. For every bug fixed, three more were introduced. This is, of course, one of Murphy's laws. Sometimes the new bugs were called enhancements to obscure the fact that they were screwups. But that's what all bugs are. They are errors that you make when you write your programs. This is the first truth that you should hold to be self-evident, if you want to develop programs that eventually perform satisfactory. Once you realize that errors in your programs will be there because of your own carelessness, or in spite of your best efforts, you can take steps to prevent them from being catastrophic. Realistic Expectations When I judge the performance of an item of software or hardware that I have purchased, I compare its performance to the levels at which I have been led to believe it should be according to the product's designer, manufacturer and seller. To that extent, if the product fails to meet my expectations, then I have been cheated. If I am fooled twice by the same designer, manufacturer or seller, then I have cheated myself. When I judge the performance of an item of software or hardware that I have designed and constructed, I restrict my expectations to levels that are commensurable with my knowledge, experience and available tools. If I have done my best, then I can do no more, unless I decide to redesign and reconstruct after obtaining more knowledge, more experience or better tools. The performance of an item of software is inherently restricted by the design of the computer system--that should be obvious. Performance is also influenced by the extent of the programmer's knowledge about the system and developed programming ability. A programmer's ability is developed via education and experience. To the extent that this ability restricts the performance of final product, it is a constituent of the overall programming environment. If a program's performance depends on your programming ability, how can you determine when its performance is satisfactory? Well, when a program executes a task according to your specifications, then its performance must be judged satisfactory. How stringent should your specifications be? Your performance demands must be commensurable with your programming ability. When you have exerted your best effort, you must be satisfied with the final product; or you must obtain another system; or you must accumulate more knowledge and experience, so that you can demand more stringent specifications. Accumulating Knowledge about your computer system's capabilities can be a horrendous task: cataloging your system's real, versus advertised or reported, capabilities requires extensive performance testing of the system, because you can't trust someone else's assessments. For example, on page 18 of my star NX-10 user's manual there is a description of a self-test and the following statements: Were you surprised? It's fast, isn't it? About 120 characters a second, to be exact. Would any serious person use the words about and exact in the way they are used above? When I execute the test on my printer, 503 characters are printed. The elapsed time according to my stopwatch was 6 seconds. This means that the printer prints 83 characters per second in that test mode. Even allowing for a one second error in timing, the printing speed would be increased to only 100 characters per second. In order for the printer to meet specifications, it would have to print the 503 characters in 4.19 seconds. Again, on page 213 of the manual the printing speed in Draft pica mode is specified to be 120 characters per second; no about qualifier there. When printing an ASCII file in that mode, with the entire file contained in the printer's buffer, so that the printer's speed depends only upon its own capability, I measure a maximum printing speed of 68 characters per second. Are the manufacturer's specifications incorrect? Is my method of timing the printer's speed incorrect? I have learned not to be absolutely sure about anything in this world, but I think my time would be wasted if I were to spend it trying to develop a program which depended on the printer's ability to print at 120 characters per second. That would be an unrealistic expectation. Performance Measuring Tools Because of its dependency on ability and personal assessment, to some extent, performance testing must be subjective. In chapter 1, I said that I have been satisfied with the star NX-10, and I have been, in spite of the printing speed controversy. The printer's other capabilities and its low cost more than compensate for that discrepancy, if it actually exists. Therefore, in my opinion, the performance of the printer is satisfactory. This is my personal assessment. Of course, one might be inclined to scold me, pointing out that my method of measuring elapsed time during the printing speed tests was crude. To which I would reply, "It's the only method that was available to me." And, I might add, I have found it to be much more reliable than words printed on paper in a user's manual. Any user's manual. One individual's judgement of the overall performance of a particular item of software is as subjective as is my conclusions about the star NX-10. But specific software attributes can be judged objectively, if tools which can measure pertinent aspects of performance are available. I am going to provide you with some of those tools in this chapter. I will introduce utility programs with which the efficiency of individual instructions, algorithms and programs may be compared. I will provide programs that perform many comparisons, but, since the subject of performance testing must be restricted to a reasonable length in the book, I will concentrate more on showing you, by example, when and how I decide to conduct performance tests, rather than flit through comparisons until you become bored with the whole idea. The First Utility While the primary objective of the chapter is to provide performance measuring utilities, a secondary objective is to illustrate specific stages of program development. I begin with the specifications for a utility to be called SPEEDTST. Then I introduce the first of a series of programs, each of which is a model that represents a snapshot of a continuous process. The other programs follow after the introduction of a program on which the models can operate. The programs introduced in this chapter invoke the custom traps described in program 13. To install the custom traps, execute TRAPS.PRG. If you want the traps to be automatically installed during system boot, copy TRAPS.PRG to the AUTO folder on your boot partition or floppy disk. The first utility will calculate a program's load and execution times. As I concluded chapter 3, I mentioned that the first stage of increasing a program's execution speed involved getting it into ram as quickly as possible. Methods of doing that will be discussed in this chapter. In order to discuss a variety of methods, I need a way to measure the time required to load and execute a program. Specifications For SPEEDTST SPEEDTST must accomplish the following: 1. Spawn a process = load and execute a program. 2. Programs to be spawned will have a TOS or PRG suffix. 3. The spawned program will reside in the same directory as does SPEEDTST. 4. Create a disk file which is to be identified by the name of the spawned program with a DAT suffix. The disk file is to reside in the same directory as does SPEEDTST. 5. Calculate the spawned program's load and execution times. 6. Store the load and execution times in the disk file described in item 4. 7. If the spawned process directs output to the video screen via GEMDOS function $9, redirect that output to the file described in item 4. The First Model Program 15 is the first in a series of four programs which progress in algorithmic perfection until the program SPEEDTST is developed. SPEED_1 is the first working model of a parent program which loads and executes a child program. The parent calculates the spawned program's load and execution times, using information returned to the parent when the child terminates. The parent creates a disk file and stores the calculated values therein. If the child directs output to the screen using GEMDOS function $9, that output will be redirected to the file. The name of the file created by the parent is composed of the name of the child, without suffix, plus the extension DAT. While it is doing all of that, the parent also confirms that trap #6 has been installed by TRAPS.PRG and functions correctly. The parent accomplishes the verification simply by being able to spawn, which it can't do if custom trap #6 fails to return excess memory to the operating system. Trap #6 also performs another function, but its effectiveness is confirmed only if the child terminates using custom trap #8. Refer to the extensive note in the data section of program 16. Program 16 must be assembled in PC-relative mode and the executable file must be saved with a TTP extension. When it is executed, the filename of the program to be spawned must reside in the same directory as does program 16. Type the name of the program to be spawned on program 16's input parameter line. As you shall see, a program that is to be spawned by program 16 must be specifically prepared for the spawning operation. Program 16, as does programs 18 and 19, invokes custom traps which must be installed by programs TRAPS.PRG (program 13, chapter 4) and TRAP_9.PRG (program 15, chapter 5), therefore, these programs must be executed from the desktop or from the AUTO folder of a boot partition or floppy before programs SPEED_1.TTP, SPEED_2.TTP or SPEED_3.TTP are executed. TRAP_9.S follows. Program 15. This program installs a custom trap for programs 16, 18 and 19. ; Program Name: TRAP_9.S ; Version 1.002 ; Assembly Instructions: ; Assemble in PC-relative mode and save with a PRG extension. ; Program Function: ; This is a LSR program that establishes a user defined trap. It may be ; executed from the desktop, but you may prefer to copy it to the AUTO ; folder of your boot partition or floppy disk so that it will execute ; automatically during boot. ; MAJOR NOTE: SEE FURTHER DOCUMENTATION FOR THIS PROGRAM IN TRAPS.S. ; Trap #9 is special in that it is only used by three programs: SPEED_1.TTP, ; SPEED_2.TTP and SPEED_3.TTP. The custom trap is used simply to reduce the ; size of those programs. ; This program invokes a custom trap that is established by TRAPS.PRG, ; therefore, that program must be executed before trap #9 is invoked by a ; program. program_start: ; Calculate program size and retain result. lea program_end, a3 ; Fetch program end address. suba.l 4(a7), a3 ; Subtract basepage address. enter_supervisor_mode: move.l #0, -(sp) ; The zero turns on supervisor mode. move.w #$20, -(sp) ; Function = super = GEMDOS $20. trap #1 ; Go to supervisor mode. addq.l #6, sp ; Supervisor stack pointer (SSP) returned in D0. movea.l d0, a5 ; Save SSP in scratch register. install_trap_9_routine: ; Note: pointer = vector = pointer. lea trap_9_routine, a0 ; Fetch address of trap #9 routine. move.l a0, $A4 ; Store custom trap address in pointer. enter_user_mode: pea (a5) ; Restore supervisor stack pointer. move.w #$20, -(sp) ; Function = super = GEMDOS $20. trap #1 ; Go to user mode. addq.l #6, sp ; Reset stack pointer to top of stack. relinquish_processor_control: ; Maintain memory residency. move.w #0, -(sp) ; See page 121 of Internals book. move.l a3, -(sp) ; Program size. move.w #$31, -(sp) ; Function = ptermres = GEMDOS $31. trap #1 trap_9_routine: ; Expects a programs load time in register D3 as a binary number. This ; algorithm converts the value in D3 to milliseconds (msec) then prints the ; load time in decimal msec. ; Also expects a programs execution time in register D5. The same service ; is performed for the value in that register. convert_load_time_to_msec: move.l d3, d0 ; Save a copy to add. asl.l #2, d3 ; Shift to multiply by 4. add.l d0, d3 ; To complete multiplication by 5. print_load_time: cmpi.l #999, d3 ; If load time is less than 1000, then bgt no_space ; print a leading blank space for output lea space, a0 ; alignment. bsr print_string cmpi.l #99, d3 ; If load time is less than 100, then bgt no_space ; print another leading blank space. lea space, a0 bsr print_string no_space: move.l d3, d1 ; Copy load time to D1 for decimal conversion. trap #4 ; Returns address of decimal string in A0. bsr.s print_string lea units_label, a0 bsr.s print_string convert_execution_time_to_msec: lea execute_time_msg, a0 bsr.s print_string move.l d5, d0 ; Save a copy to add. asl.l #2, d5 ; Shift to multiply by 4. add.l d0, d5 ; To complete multiplication by 5. print_execution_time: cmpi.l #999, d5 ; If execute time is less than 1000, then bgt _no_space ; print a leading blank space for output lea space, a0 ; alignment. bsr print_string cmpi.l #99, d5 ; If execute time is less than 100, then bgt.s _no_space ; print another leading blank space. lea space, a0 bsr.s print_string _no_space: move.l d5, d1 ; Copy execute time for decimal conversion. trap #4 ; Returns address of decimal string in A0. bsr.s print_string lea units_label, a0 bsr.s print_string rte ; ; Subroutine ; print_string: ; Expects address of string to be in A0. pea (a0) ; Push address of string onto stack. move.w #9, -(sp) ; Function = c_conws = GEMDOS $9. trap #1 ; GEMDOS call addq.l #6, sp ; Reset stack pointer to top of stack. rts data space: dc.b " ",0 execute_time_msg: dc.b " Execute time: ",0 units_label: dc.b " milliseconds", $D,$A,0 bss align ; Align storage on a word boundary. program_end: ds.l 0 end Program 16. A utility that computes a program's load and execution times. ; Program Name: SPEED_1.S ; Version: 1.006 ; Assembly Instructions: ; Assemble in "PC-relative" mode and save with a TTP extension. ; Execution Instructions: ; SPEED_1.TTP will not execute unless the custom traps in program ; TRAPS.PRG and TRAP_9.PRG have previously been installed. The custom ; traps are installed when those programs are executed from the desktop ; or from an AUTO folder on a boot disk. ; NOTE: The time required for a program to be loaded into memory depends ; on the assembly mode used to assemble the program. This will be ; shown, using SPEEDTST.TTP, in chapter 5. ; In addition, a program's load time depends on the drive from which ; the program is loaded, the method used to format the disk on which ; the program is located, the position of the program on the disk ; and, in this case, the position of the child process relative to the ; position of the parent process. ; To eliminate the drive-variables when comparing the load and ; execution times of one program to that of another, the parent and ; the child should be isolated to an otherwise empty partition or ; floppy disk for each spawning instance. ; For example, if there are two programs involved in the comparison, ; first copy the parent, which is SPEED_1 in this case, so that it is ; the only item in the hard disk partition or on the floppy. Then, ; copy the first program to the same partition or floppy. Execute ; the parent, SPEED_1 in this case, and obtain the results. ; Remove the first program and copy the second. Execute the parent ; and obtain the results for the second program. ; Execute from the desktop. Type the name of an executable file which ; has a TOS or PRG extension on SPEED_1.TTP's input parameter line. The ; name of the program you type on the parameter line must be in the same ; directory as is SPEED_1.TTP and the program must be one that terminates ; with GEMDOS function $4C. ; Upon termination, the spawned program must return the value that is in ; memory location $4BA immediately after it has been loaded, hereafter called ; the after-load time or after-load value. Custom trap #3 (get_time) can be ; used to obtain that value. SPEED_1.TTP uses the value returned in D0 to ; calculate the spawned program's load and execution times. ; The spawned program must terminate with GEMDOS function $4C so that ; the after-load value can be returned in D0 by that function. The value ; returned in D0 by GEMDOS function $4C is limited to a 16 bit value. ; If the spawned program has any halt or wait instructions, such as wait ; for a keypress, etc., those should be commented out, then the program ; should be assembled especially for the speed test. Otherwise the ; execution time will include the time waiting for input. ; If custom trap #8 is used to terminate the program, the trap will ; execute a wait_for_keypress algorithm when the program is executed from ; the desktop, but it will omit the wait algorithm when the program is ; spawned by SPEED_1.TTP. In addition, trap #8 will return the after-load ; value to SPEED_1.TTP and terminate the spawned program with GEMDOS function ; $4C. ; Both trap #8 and SPEED_1.TTP require that the spawned program be ; initialized with custom trap #6. See the note in the data section, below. ; Primary Function: ; Spawn a process. Calculate the spawned program's load and execution ; times. Store these values in a disk file that is identified by the name ; of the spawned process with a DAT suffix. ; If the spawned process directs output to the screen, store that output ; in the same disk file. Note: only screen directed output processed by ; GEMDOS function $9 will be directed to the file. If BIOS function $3 is ; used for screen output, that output will not be redirected to the file. ; Secondary Function: ; Verify that trap #6 is resident and functions correctly. SPEED_1 ; confirms that because it will not be able to spawn a process unless ; the trap #6 call has returned excess memory to the system. ; Description: ; SPEED_1 is the first in a series of programs which progress in ; algorithmic perfection until the program SPEEDTST is developed. Using ; this series of programs, I intend to help you experience selected stages ; of a program development process. ; The primary attribute of this development process is its dependence, ; during the early stages of development, on familiar documented algorithms ; that can easily be found in references for many programming languages. ; After a working model has been developed with these familiar algorithms, ; attempts are made to introduce unfamiliar algorithms which may be faster ; or consume less memory. release_excess_memory: lea program_end, a0 ; Put "end of program" address in A0. movea.l 4(a7), a1 ; Put "basepage" address in A1. movea.l a1, a4 ; Copy to A4 for command line access. trap #6 ; Calculate program size and release memory. ; NOTE: A local stack is not declared in PRG_5AP.TOS. Because of the long ; string that is printed by that program, this program will bomb when ; it spawns PRG_5AP.TOS, if a local stack is not declared here. lea stack, a7 ; Point A7 to this program's stack. ; The next task to be accomplished is an initialization algorithm. The ; name of the program that is to be typed on SPEED_1.TTP's input parameter ; line must be used in several ways. First, its suffix must be changed to ; DAT so that it can be passed as a NULL terminated string when GEMDOS $3C ; is invoked to create the disk file. ; Then it must be passed as a NULL terminated string with the program's ; original suffix when GEMDOS $4B is invoked to spawn the program. ; Finally, the program's name is used as part of SPEED_1.TTP's output ; header. ; The command line processing algorithm creates the required NULL terminated ; strings, storing them in locations declared in the data section of SPEED_1. process_command_line_parameters: lea $80(a4), a4 ; Fetch address of parameters. move.b (a4)+, d0 ; Fetch parameter line character count. lea program_name, a3 ; Load program_name address in A3. subq.b #1, d0 ; Set up counter. ext.w d0 ; Extend to match the size of the dbra ; instruction. ; NOTE: The dbcc instruction operates on a word length value, therefore, ; the value in the register that is to be decremented by a dbcc ; instruction must be placed there with a word size instruction, such ; as move.w #10, D0; or with a longword size instruction, as long as ; the value in the longword is limited to word size validity, or with ; a byte size instruction, as long as the value in the register is ; sign extended to word size, as is done in the instruction above. fetch_character: move.b (a4)+, (a3)+ ; Store character. dbra d0, fetch_character ; Loop until d0 becomes negative. move.b #0, (a3) ; Finish with a NULL. create_file_name: ; Create a file to accept standard output. lea filename, a4 lea program_name, a3 copy_name: move.b (a3)+, (a4)+ cmpi.b #$2E, (a3) ; Is next byte of program_name the period? bne.s copy_name ; Continue looping until period is seen. move.b #$2E, (a4)+ ; Add a period. move.b #$44, (a4)+ ; Add letter 'D'. move.b #$41, (a4)+ ; Add letter 'A'. move.b #$54, (a4)+ ; Add letter 'T'. move.b #0, (a4) ; Add a NULL. create_file: move.w #0, -(sp) ; File attribute = read/write. pea filename ; Will be name of spawned process + .DAT. move.w #$3C, -(sp) ; Function = f_create = GEMDOS $3C. trap #1 ; File handle is returned in D0. addq.l #8, sp lea file_handle, a0 ; Store returned file handle. move.w d0, (a0) redirect_output: ; Exchange file handle with screen's handle. move.w file_handle, -(sp) ; This is the disk file's handle. move.w #1, -(sp) ; This is the video screen's handle. move.w #$46, -(sp) ; Function = f_force = GEMDOS $46. trap #1 addq.l #6, sp get_start_time: lea start_time, a3 ; Fetch address of variable "start_time". trap #3 ; Returns value of system clock in D0. move.w d0, (a3) ; Save start time. load_and_execute_program: pea environ_string pea command_line pea program_name move.w #0, -(sp) move.w #$4B, -(sp) ; Function = GEMDOS $4B = p_exec. trap #1 move.w d0, d3 ; Copy after-load value to D3 for calculation. get_end_time: trap #3 ; Returns value of system clock in D0. move.w d0, d5 ; Copy to D5 for calculation. sub.w d3, d5 ; Subtract after-load time from end time. ext.l d5 ; Extend to 32 bits. ; NOTE: D5 now contains the spawned program's execution time, but the time ; has not yet been converted to milliseconds. See the note below ; concerning the sign extension of D3 and D5. reposition_stack_pointer: lea $10(sp), sp ; Note the difference between the use of GEMDOS function $19 below and ; the way it is used on page 116 of the Internals book. In the ; Internals book there are two errors: (1) sp should not be referenced ; indirectly, as (sp); (2) the ASCII code for the letter A should be ; added to the contents of the register--in the internals book the ; contents of the register are added to the ASCII code for the letter ; A. get_drive: move.w #$19, -(sp) ; Function = dgetdrv = GEMDOS $19. trap #1 ; Returns 0 for drive A, 1 for B, etc. addq.l #2, sp add.b #$41, d0 ; Add ASCII value for A to compute ASCII lea drive, a0 ; letter code for the drive value returned. move.b d0, (a0) ; Save drive's ASCII letter code. print_heading: lea heading, a0 bsr print_string lea program_name, a0 bsr print_string print_drive_for_spawned_program: lea drive_msg, a0 bsr print_string compute_load_time: lea load_time_msg, a0 bsr.s print_string lea start_time, a3 sub.w (a3), d3 ; Subtract start time from after-load time. ext.l d3 ; Extent to 32 bits. ; SIGN EXTENSION NOTE ; The value in D3, above, and in D5 previously, is extended to 32 bits ; because, although the number of 200hz intervals we are able to utilize is ; limited to a word size by the value that is returned in D0 via GEMDOS ; function $4C, the time converted to milliseconds can extend beyond that ; word size limitation. trap #9 ; See description in TRAP_9.S. close_file: move.w file_handle, -(sp) move.w #$3E, -(sp) ; Function = fclose = GEMDOS $3E. trap #1 addq.l #4, sp terminate: move.w #0, -(sp) trap #1 print_string: ; Expects address of string to be in A0. pea (a0) ; Push address of string onto stack. move.w #9, -(sp) ; Function = c_conws = GEMDOS $9. trap #1 ; GEMDOS call addq.l #6, sp ; Reset stack pointer to top of stack. rts data heading: dc.b $D,$A,"SPEED_1.TTP Execution Results",$D,$A dc.b "for ",0 drive_msg: dc.b ", loaded from drive: " drive: dc.b "A",$D,$A,0 load_time_msg: dc.b $D,$A," Load time: ",0 ; NOTE: Custom trap #6 checks the environmental string pointer of each ; program that invokes it to see if the pointer contains the address ; of the label "environ_string" below. That test is performed by ; comparing the contents of the address contained in the pointer to ; the ASCII string "TERM" declared below. ; When a match occurs, it means that the program invoking trap #6 has ; been spawned by SPEED_1 (or by a similar program), therefore, trap ; #6 sets the value of the boolean variable "spawned", declared by ; TRAPS.PRG, to all ones = true. ; When custom trap #8 is invoked by a program, the state of the ; variable "spawned" is tested. If the state is true, the program ; invoking custom trap #8 is terminated with GEMDOS function $4C and ; the after-load time, which was saved by custom trap #6, is returned ; to the parent program. ; If the state of "spawned" is false, GEMDOS function $8 is executed ; so that execution will pause for a keypress. When the keypress is ; received, GEMDOS function $0 is executed. ; In this manner, custom trap #8, working in conjunction with custom ; trap #6, eliminates the "wait for keypress" algorithm automatically ; when a program is spawned by SPEED_1 (or a similar program). This ; prevents the computed execution time from being corrupted by a time ; period that involves a wait for keyboard input. environ_string: dc.b "TERM",0 command_line: dc.b 0 align bss start_time: ds.w 1 ; Value in $4BA just before spawning. file_handle: ds.w 1 ; Handle for the filename below. filename: ds.l 4 ; File name for execution results. program_name: ds.l 4 ; Filename buffer. Must be NULL terminated. ds.l 96 ; Program stack. stack: ds.l 0 ; Address of program stack. program_end: ds.l 0 end Program 17 was prepared as a simple example to be executed by program 16 and the other programs in the series. Program 17 illustrates the use of custom traps #3, #6 and #8. Assemble programs 16 and 17, then, with their executable files in the same directory, execute program 16. Type the name of program 17's executable file on program 16's command line. Figure 5.1 shows the contents of the file produced by program 16. The values stored in the file depend on the variables mentioned in program 16's documentation. Program 17. Execute this program by typing PRG_5AP.TOS on SPEED_1.TTP's command line. ; Program Name: PRG_5AP.S ; Version 1.003 ; Assembly Instructions: ; Assemble in PC-relative mode and save with a TOS extension. ; Execution Note: ; This program invokes custom traps which must be installed by ; TRAPS.PRG prior to its execution. ; Program Function: ; This program illustrates the use of custom traps #3, #6 and #8. ; If the program is executed from the desktop, trap #8 will execute the ; wait_for_keypress algorithm, then, when a key is pressed it will execute ; GEMDOS function 0. ; If, instead, this program is executed by typing its name on ; SPEEDTST.TTP's input parameter line, trap #8 will not execute the ; wait_for_keypress algorithm, but it will immediately execute GEMDOS ; function $4C. ; Trap #3 returns, in D0, the value of the system clock as it is ; immediately after this program has been loaded. The value in D0 is not ; corrupted before trap #6 is invoked, therefore, it is still valid when ; the trap #6 routine begins to execute. Trap #6 saves the "after-load" ; value of the system clock in its own local variable, where it is available ; for processing during the execution of trap #8. ; Trap #6 also calculates the memory occupied by this program and releases ; the memory not occupied by this program to the operating system. fetch_load_time: trap #3 ; Returns value of system clock in D0. release_excess_memory: ; Also stores after-load time in TRAPS bss. lea program_end, a0 ; Put "end of program" address in A0. movea.l 4(a7), a1 ; Put "basepage" address in A1. trap #6 ; Calculate program size and release memory. waste_time: move.l #$1, d0 outer_loop: move.l #$FDE8, d1 inner_loop: move.l #$FDE8, d2 dbra d1, inner_loop dbra d0, outer_loop lea heading, a0 bsr.s print_string lea string, a0 bsr.s print_string trap #8 ; Terminate. print_string: pea (a0) move.w #9, -(sp) trap #1 addq.l #6, sp rts data heading: dc.b 'PRG_5AP.TOS Execution Results',$D,$A,$D,$A,0 string: dc.b ' When executed from the desktop, this program will print ' dc.b 'this string on the',$D,$A dc.b ' video screen and pause for a keypress. But, when this ' dc.b 'program is spawned by',$D,$A dc.b ' SPEED_1, SPEED_2, SPEED_3 or SPEEDTST, the string will ' dc.b 'be stored in a file ',$D,$A dc.b ' named PRG_5AP.DAT and the program will not pause for a ' dc.b ' keypress.',$D,$A,0 bss align program_end: ds.l 0 end ; Assembler pseudo-op. PRG_5AP.TOS Execution Results When executed from the desktop, this program will print this string on the video screen and pause for a keypress. But, when this program is spawned by SPEED_1, SPEED_2, SPEED_3 or SPEEDTST, the string will be stored in a file named PRG_5AP.DAT and the program will not pause for a keypress. Figure 5.1. Contents of PRG_5AP.DAT, the data file produced by program 16 to contain program 17's load and execution times. PRG_5AP.TOS Execution Results When executed from the desktop, this program will print this string on the video screen and pause for a keypress. But, when this program is spawned by SPEED_1, SPEED_2, SPEED_3 or SPEEDTST, the string will be stored in a file named PRG_5AP.DAT and the program will not pause for a keypress. SPEED_1.TTP Execution Results for PRG_5AP.TOS, loaded from drive: G Load time: 45 milliseconds Execute time: 680 milliseconds The Second Model After program 16 was operational, I began to think about ways I might improve the command line processing algorithm. Also, I decided to try to improve the accuracy of the calculated load time by initializing the stack for GEMDOS $4B, withholding the invocation of trap #1, then invoking trap #3 to get the start time, just before invoking trap #1 to load and execute program 17. The improvements are incorporated in program 18, the next program in the series. In SPEED_2, the movem.l instruction is used to move the command line to four registers, then from there to a declared location in the data section. Since this program is simply a model, and since the algorithms which create the disk file were developed in SPEED_1, I decided that there was no reason to repeat those algorithms in SPEED_2. However, I discovered that, for no apparent reason, the load time reported by SPEED_2 increased significantly, even though the experiments with SPEED_1 and SPEED_2 were executed under identical conditions. By eliminating each of SPEED_1's algorithms that are involved with the disk file, in turn, I learned that, for some reason, the load time is shorter when a file is created. Therefore, in order to maintain a valid experiment, I created a dummy file in SPEED_2, but wrote nothing to it. But by the time I got to SPEEDTST, I realized that the file name creation algorithm was actually a part of the command line processing algorithm, therefore, in order to validate comparisons between the three models, I had to redo SPEED_2 and SPEED_3, including a file name creation algorithm in each. While doing that, I was able to use the movem.l instruction to develop a faster creation algorithm than that used in SPEED_1. Program 18. The next stage of SPEEDTST.TTP's development. ; Program Name: SPEED_2.S ; Version 1.003 ; NOTE: This program is similar to SPEED_1. The differences between the ; the two is that this one uses a different algorithm to process ; the command line, and it fetches the start time at a more appropriate ; place in the program. ; Assembly Instructions: ; Assemble in "PC-relative" mode and save with a TTP extension. ; Function: ; Spawn a process and calculate the spawned program's load and execution ; times. Pause for a keypress before terminating. release_excess_memory: lea program_end, a0 ; Put "end of program" address in A0. movea.l 4(a7), a1 ; Put "basepage" address in A1. movea.l a1, a4 ; Copy to A4 for command line access. trap #6 ; Calculate program size and release memory. ; NOTE: A local stack is not declared in PRG_5AP.TOS. Because of the long ; string that is printed by that program, this program will bomb when ; it spawns PRG_5AP.TOS, if a local stack is not declared here. lea stack, a7 ; Point A7 to this program's stack. ; NOTE ABOUT THE COMMAND LINE PROCESSING ALGORITHM ; Refer to figure 2.13 of chapter 2 for an image of a command line ; that is stored in a program's basepage. The first byte of the command ; line is a count of the ASCII characters contained therein. The second ; byte is the first character in the command line. The last character in ; the command line is followed by the ASCII code for a carriage return; ; the carriage return is not included in the character count. ; For program SPEED_2 we know that the command line character count ; cannot exceed 12 characters = 12 bytes = 3 longwords. Therefore, it ; would be convenient if those 3 longwords could be transfered directly to ; three data registers. Unfortunately, the MC68000 will not permit the ; movem instruction to transfer data which begins at an odd address. ; Because of this restriction, it would be convenient if the operating ; system stored the first command line character at an even address. ; Unfortunately, it does not. Therefore, we are forced to fetch 4 longwords ; from the vicinity of the command line. That's why we must use four data ; registers instead of three. ; To complicate things, the command line ASCII string will be corrupted ; by the first byte in the first register, because it is the character count, ; not a valid character. So, when the data contained in the data registers ; are transferred to a declared variable location, this byte must be stripped ; from the command line ASCII string. ; I accomplish this with no wasted time by declaring two variable ; locations, input_line and program_name. Since input_line is one byte in ; length, and the first location for program_name immediately follows that ; byte, when the contents of the data registers is moved to the location of ; input_line, the variable program_name will point to the first character ; of the command line ASCII string, as it should. ; The carriage return at the end of the ASCII string is also transferred ; to the 15 byte array addressed by program_name. It must be overwritten by ; a NULL so that the ASCII string is NULL terminated. That is accomplished ; fetching the command line character count as a byte length value, extending ; it to word length and using the result in an operand that uses "address ; register indirect with index" addressing. process_command_line: lea input_line, a3 ; Fetch location to contain command line. lea output_line, a5 ; A second location: for filename. movem.l $80(a4), d0-d3 ; Move 16 bytes of command line to 4 registers. movem.l d0-d3, (a3) ; Move them to address "input_line". movem.l d0-d3, (a5) ; Move them to address "output_line". move.b $80(a4), d0 ; Fetch command line ASCII character count. ext.w d0 ; Extend to word for next instruction. move.b #0, 1(a3,d0.w) ; Store a null at end of command line input. move.b #0, 1(a5,d0.w) ; Same for filename buffer. insert_filename_suffix: move.b #$44, -2(a5,d0.w) ; Insert letter 'D'. move.b #$41, -1(a5,d0.w) ; Insert letter 'A'. move.b #$54, 0(a5,d0.w) ; Insert letter 'T'. create_file: move.w #0, -(sp) ; File attribute = read/write. pea filename ; Will be name of spawned process + .DAT. move.w #$3C, -(sp) ; Function = f_create = GEMDOS $3C. trap #1 ; File handle is returned in D0. addq.l #8, sp lea file_handle, a0 move.w d0, (a0) redirect_output: ; Exchange file handle with screen's handle. move.w file_handle, -(sp) ; This is the disk file's handle. move.w #1, -(sp) ; This is the video screen's handle. move.w #$46, -(sp) ; Function = f_force = GEMDOS $46. trap #1 addq.l #6, sp ; NOTE: In order to increase the accuracy of the start time, the stack is ; prepared for the spawning process, then, just before trap #1 is ; invoked, custom trap #3 is invoked and the start time is saved. prepare_stack_for_load_and_execute_program: pea environ_string pea command_line pea program_name move.w #0, -(sp) move.w #$4B, -(sp) ; Function = GEMDOS $4B = p_exec. get_start_time: lea start_time, a3 ; Fetch address of variable "start_time". trap #3 ; Returns value of system clock in D0. move.w d0, (a3) ; Save start time. load_and_execute_program: trap #1 move.w d0, d3 ; Copy after-load value to D3 for calculation. get_end_time: trap #3 ; Returns value of system clock in D0. move.w d0, d5 ; Copy to D5 for calculation. sub.w d3, d5 ; Subtract after-load time from end time. ext.l d5 ; Extend to 32 bits. reposition_stack_pointer: lea $10(sp), sp get_drive: move.w #$19, -(sp) ; Function = dgetdrv = GEMDOS $19. trap #1 addq.l #2, sp add.b #'A', d0 lea drive, a0 move.b d0, (a0) print_heading: lea heading, a0 bsr print_string lea program_name, a0 bsr print_string print_drive_for_spawned_program: lea drive_msg, a0 bsr print_string compute_load_time: lea load_time_msg, a0 bsr print_string lea start_time, a3 sub.w (a3), d3 ; Subtract start time from after-load time. ext.l d3 ; Extent to 32 bits. trap #9 ; See description in TRAPS.S. close_file: move.w file_handle, -(sp) move.w #$3E, -(sp) ; Function = fclose = GEMDOS $3E. trap #1 addq.l #4, sp terminate: move.w #0, -(sp) trap #1 print_string: ; Expects address of string to be in A0. move.l a0, -(sp) ; Push address of string onto stack. move.w #9, -(sp) ; Function = c_conws = GEMDOS $9. trap #1 ; GEMDOS call addq.l #6, sp ; Reset stack pointer to top of stack. rts data heading: dc.b $D,$A,"SPEED_2.TTP Execution Results",$D,$A dc.b "for ",0 drive_msg: dc.b ", loaded from drive: " drive: dc.b "A",$D,$A,0 load_time_msg: dc.b $D,$A," Load time: ",0 environ_string: dc.b "TERM",0 command_line: dc.b 0 align bss start_time: ds.w 1 file_handle: ds.w 1 input_line: ds.b 1 program_name: ds.b 15 ; Program name buffer. output_line: ds.b 1 filename: ds.b 15 ; Filename buffer. ds.l 96 ; Program stack. stack: ds.l 0 ; Address of program stack. program_end: ds.l 0 end SPEED_2.TTP Execution Results PRG_5AP.TOS Execution Results When executed from the desktop, this program will print this string on the video screen and pause for a keypress. But, when this program is spawned by SPEED_1, SPEED_2, SPEED_3 or SPEEDTST, the string will be stored in a file named PRG_5AP.DAT and the program will not pause for a keypress. SPEED_2.TTP Execution Results for PRG_5AP.TOS, loaded from drive: G Load time: 45 milliseconds Execute time: 680 milliseconds Program 19 serves as the final model to be considered in the development of SPEEDTST.TTP. Within it, a new command line processing algorithm is developed and the user declared stack is discarded. As in the two previous models, the command line algorithm must prepare two strings: one to be used as the name of a disk file, the other to be passed as a parameter when GEMDOS $4B is invoked. The latter string is also used as part of the utility's header. The algorithm in this model takes advantage of the presence of the string in the command line to eliminate movement to prepare the latter string. Instead, the string is altered in place, at its location in the basepage. The movement which is required is a prerequisite for preparation of the filename string. Program 19. The final program model in SPEEDTST.TTP's development. ; Program Name: SPEED_3.S ; Version 1.004 ; NOTE: This program is similar to SPEED_2. The differences are that a ; different algorithm is used to process the command line, and no user ; stack is declared in SPEED_3. ; Assembly Instructions: ; Assemble in "PC-relative" mode and save with a TTP extension. ; Function: ; Spawn a process and calculate the spawned program's load and execution ; times. Pause for a keypress before terminating. release_excess_memory: lea program_end, a0 ; Put "end of program" address in A0. movea.l 4(a7), a1 ; Put "basepage" address in A1. lea $80(a1), a3 ; Put "command line" address in A3. trap #6 ; Calculate program size and release memory. ; NOTE: A local stack is not declared in PRG_5AP.TOS. Because of the long ; string that is printed by that program, this program will bomb when ; it spawns PRG_5AP.TOS, if a local stack is not declared here. lea stack, a7 ; Point A7 to this program's stack. ; COMMAND LINE PROCESSING NOTE ; At this point register A3 contains the address of the command line. ; In the algorithm below, the address of the first ASCII character in the ; command line input is stored at the pointer "program_name". Then a NULL ; character is written over the carriage return code at the end of the ; command line input. Thus the command line input itself becomes the ; string, the address of which must be pushed on the stack during the p_exec ; invocation. ; Even though register A3 contains the address of the program name string, ; and the contents of A3 can be pushed during the p_exec invocation, the ; address of the string must be stored in a declared location because ; register A3 might be used by the spawned program. And the address of the ; string is still needed to print the spawned program's name in SPEED_3's ; output heading. process_command_line: lea command_line, a4 ; Fetch location to contain command line. movem.l (a3), d0-d3 ; Move 16 bytes of command line to 4 registers. movem.l d0-d3, (a4) ; Move them to address "command_line". move.b (a3)+, d0 ; Fetch parameter line input character count. ext.w d0 ; Extend to word for next instruction. move.b #0, 1(a4,d0.w) ; Store a null at end of string. lea program_name, a0 ; Fetch address of pointer to program name. move.l a3, (a0) ; Store address of program name in pointer. move.b #0, 0(a3,d0.w) ; Replace $0D at end of program name with NULL. insert_filename_suffix: move.b #$44, -2(a4,d0.w) ; Insert letter 'D'. move.b #$41, -1(a4,d0.w) ; Insert letter 'A'. move.b #$54, 0(a4,d0.w) ; Insert letter 'T'. create_file: move.w #0, -(sp) ; File attribute = read/write. pea filename ; Will be name of spawned process + .DAT. move.w #$3C, -(sp) ; Function = f_create = GEMDOS $3C. trap #1 ; File handle is returned in D0. addq.l #8, sp lea file_handle, a0 move.w d0, (a0) redirect_output: ; Exchange file handle with screen's handle. move.w file_handle, -(sp) ; This is the disk file's handle. move.w #1, -(sp) ; This is the video screen's handle. move.w #$46, -(sp) ; Function = f_force = GEMDOS $46. trap #1 addq.l #6, sp prepare_stack_for_load_and_execute_program: pea environ_string pea command_string pea (a3) ; Push address of program name string. move.w #0, -(sp) move.w #$4B, -(sp) ; Function = GEMDOS $4B = p_exec. get_start_time: lea start_time, a3 ; Fetch address of variable "start_time". trap #3 ; Returns value of system clock in D0. move.w d0, (a3) ; Save start time. load_and_execute_program: trap #1 move.w d0, d3 ; Copy after-load value to D3 for calculation. get_end_time: trap #3 ; Returns value of system clock in D0. move.w d0, d5 ; Copy to D5 for calculation. sub.w d3, d5 ; Subtract after-load time from end time. ext.l d5 ; Extend to 32 bits. reposition_stack_pointer: lea $10(sp), sp get_drive: move.w #$19, -(sp) ; Function = dgetdrv = GEMDOS $19. trap #1 addq.l #2, sp add.b #'A', d0 lea drive, a0 move.b d0, (a0) print_heading: lea heading, a0 bsr print_string lea program_name, a0 ; Fetch address of program name string. movea.l (a0), a0 bsr print_string ; Print spawned program's name. print_drive_for_spawned_program: lea drive_msg, a0 ; Print drive from which spawned program was bsr print_string ; loaded. compute_load_time: lea load_time_msg, a0 bsr print_string lea start_time, a3 sub.w (a3), d3 ; Subtract start time from after-load time. ext.l d3 ; Extent to 32 bits. trap #9 ; See description in TRAPS.S. close_file: move.w file_handle, -(sp) move.w #$3E, -(sp) ; Function = fclose = GEMDOS $3E. trap #1 addq.l #4, sp terminate: move.w #0, -(sp) trap #1 print_string: ; Expects address of string to be in A0. move.l a0, -(sp) ; Push address of string onto stack. move.w #9, -(sp) ; Function = c_conws = GEMDOS $9. trap #1 ; GEMDOS call addq.l #6, sp ; Reset stack pointer to top of stack. rts data heading: dc.b $D,$A,"SPEED_3.TTP Execution Results",$D,$A dc.b "for ",0 drive_msg: dc.b ", loaded from drive: " drive: dc.b "A",$D,$A,0 load_time_msg: dc.b $D,$A," Load time: ",0 environ_string: dc.b "TERM",0 command_string: dc.b 0 align bss start_time: ds.w 1 program_name: ds.l 1 ; Pointer to string in basepage command line. file_handle: ds.w 1 command_line: ds.b 1 ; Unused character count will go here. filename: ds.b 15 ; File name for redirected output. ds.l 96 ; Program stack. stack: ds.l 0 ; Address of program stack. program_end: ds.l 0 end SPEED_3.TTP Execution Results PRG_5AP.TOS Execution Results When executed from the desktop, this program will print this string on the video screen and pause for a keypress. But, when this program is spawned by SPEED_1, SPEED_2, SPEED_3 or SPEEDTST, the string will be stored in a file named PRG_5AP.DAT and the program will not pause for a keypress. SPEED_3.TTP Execution Results for PRG_5AP.TOS, loaded from drive: G Load time: 40 milliseconds Execute time: 685 milliseconds Each of the three programs, SPEED_1, SPEED_2 and SPEED_3 are models which fixate attention to a particular phase of a continuous development cycle. It would be very difficult, if not impossible, to pause as each instruction of each algorithm is chosen in order to describe the creative processes which instigate the choice. Furthermore, the algorithmic development process is rhythmically recursive. At intervals, the duration of which is dictated by personal education and experience, the programmer is drawn back to the beginning of the process to verify what has been done and, perhaps, to refine portions of the product. The final stage of the development process involves an assimilation of the best features of the three programs into a utility that is a fast as possible, while consuming minimum requisite memory. In order to choose the best command line processing algorithm from the three that were introduced in the models, a comparison of their relative speeds and requisite memory is needed. Program 20 was written to perform that chore. Program 20. This program was used to compare the speeds of the command line processing algorithms used in programs 16, 18 and 19. ; Program Name: CMD_TEST.S ; Version 1.004 ; Assembly Instructions: ; Assemble in "PC-relative" mode and save with a TOS extension. ; Execution Instructions: ; Execute program CMD_TEST.TOS from the desktop. After reading the ; program's output on the screen, terminate execution by pressing the ; Return key. ; Function: ; This program is used to compare the relative speed of the command ; line processing algorithms used in SPEED_1, SPEED_2 and SPEED_3. ; Description: ; Three command line processing algorithms are executed 10,000 times. ; The elapsed time and requisite memory for each algorithm is printed to ; the screen. So that this program need not be executed as a TTP program, ; the command line is salted with a declared string. release_excess_memory: lea program_end, a0 ; Put "end of program" address in A0. movea.l 4(a7), a1 ; Put "basepage" address in A1. movea.l a1, a5 ; Copy to A5 for command line access. trap #6 ; Calculate program size and release memory. lea stack, a7 ; Point A7 to this program's stack. mainline: lea heading, a0 bsr print_string salt_command_line: lea salt, a0 ; Fetch pointer to ersatz command line input. movem.l (a0), d0-d3 ; Move it to registers. movem.l d0-d3, $80(a5) ; Copy to actual command line address. speed_1_algorithm: lea speed_1_msg, a0 bsr print_string move.l #9999, d4 ; Initialize counter for 10000 executions. trap #3 ; Get start time. move.l d0, d5 ; Copy for calculations. speed_1_loop: lea $80(a5), a4 ; Fetch address of parameters. move.b (a4)+, d0 ; Fetch parameter line character count. lea program_name_1, a3 ; Load buffer address. subq.b #1, d0 ; Set up counter. ext.w d0 ; Extend to match the size of the dbra ; instruction. fetch_character: move.b (a4)+, (a3)+ ; Store character. dbra d0, fetch_character ; Loop until D0 becomes negative. move.b #0, (a3) ; Finish with a NULL. create_file_name: ; Create a file to accept standard output. lea filename_1, a4 ; Load buffer address. lea program_name_1, a3 ; Load buffer address. copy_name: move.b (a3)+, (a4)+ cmpi.b #$2E, (a3) ; Is next byte of program_name the period? bne.s copy_name ; Continue looping until period is seen. move.b #$2E, (a4)+ ; Add a period. move.b #$44, (a4)+ ; Add letter 'D'. move.b #$41, (a4)+ ; Add letter 'A'. move.b #$54, (a4)+ ; Add letter 'T'. move.b #0, (a4) ; Add a NULL. speed_1_memory: dbra d4, speed_1_loop ; Loop until D4 becomes negative. trap #3 ; Get end time. bsr convert_and_print_time speed_2_algorithm: lea speed_2_msg, a0 bsr print_string move.l #9999, d4 ; Initialize counter for 10000 executions. trap #3 ; Get start time. move.l d0, d5 ; Copy for calculations. speed_2_loop: lea input_line, a3 ; Fetch location to contain command line. lea output_line_2, a4 ; A second location: for filename. movem.l $80(a5), d0-d3 ; Move 16 bytes of command line to 4 registers. movem.l d0-d3, (a3) ; Move them to address "input_line". movem.l d0-d3, (a4) ; Move them to address "output_line". move.b $80(a5), d0 ; Fetch command line ASCII character count. ext.w d0 ; Extend to word for next instruction. move.b #0, 1(a3,d0.w) ; Store a null at end of command line input. move.b #0, 1(a4,d0.w) ; Same for filename buffer. insert_filename_suffix: move.b #$44, -2(a4,d0.w) ; Insert letter 'D'. move.b #$41, -1(a4,d0.w) ; Insert letter 'A'. move.b #$54, 0(a4,d0.w) ; Insert letter 'T'. speed_2_memory: dbra d4, speed_2_loop ; Loop until D4 becomes negative. trap #3 ; Get end time. bsr convert_and_print_time speed_3_algorithm: lea speed_3_msg, a0 bsr print_string move.l #9999, d4 ; Initialize counter for 10000 executions. lea $80(a5), a5 ; Fetch command line address. trap #3 ; Get start time. move.l d0, d5 ; Copy for calculations. speed_3_loop: ; NOTE: The first instruction, below, is not used in the actual SPEED_3 ; algorithm, but it must be included here to reset A3 to the ; correct address each time through the loop. This instruction ; adds 4 clock periods per loop, 40000 clock periods for the ; 10000 loops, which is 5 milliseconds. The accuracy of this error ; calculation was confirmed by executing CMD_TEST.TOS with and ; without the instruction in the loop. The 5 msec error is equal to ; one system clock tick, therefore, when the loop end-time is obtained ; with the trap #3 invocation, 1 clock tick is subtracted before the ; loop time is calculated. ; The memory occupied by this instruction is not included in the ; value reported for the algorithm's requisite memory. movea.l a5, a3 start_memory: lea output_line_3, a4 ; Fetch location to contain command line. movem.l (a3), d0-d3 ; Move 16 bytes of command line to 4 registers. movem.l d0-d3, (a4) ; Move them to address "command_line". move.b (a3)+, d0 ; Fetch command line ASCII character count. ext.w d0 ; Extend to word for next instruction. move.b #0, 1(a4,d0.w) ; Store a null at end of string. lea program_name_ptr, a0 ; Fetch address of pointer to program name. move.l a3, (a0) ; Store address of filename string in pointer. move.b #0, 0(a3,d0.w) ; Replace $0D at end of program name with NULL. _insert_filename_suffix: move.b #$44, -2(a4,d0.w) ; Insert letter 'D'. move.b #$41, -1(a4,d0.w) ; Insert letter 'A'. move.b #$54, 0(a4,d0.w) ; Insert letter 'T'. speed_3_memory: dbra d4, speed_3_loop trap #3 subq.w #1, d0 ; Subtract 1 clock tick to correct time. bsr.s convert_and_print_time speed_1_requisite_memory: lea speed_1_memory_msg, a0 bsr.s print_string lea speed_1_loop, a1 ; Calculate number of bytes occupied by the lea speed_1_memory, a0 ; instructions in the loop, then print. bsr.s calculate_and_print_requisite_memory speed_2_requisite_memory: lea speed_2_memory_msg, a0 bsr.s print_string lea speed_2_loop, a1 ; Calculate number of bytes occupied by the lea speed_2_memory, a0 ; instructions in the loop, then store. bsr.s calculate_and_print_requisite_memory speed_3_requisite_memory: lea speed_3_memory_msg, a0 bsr print_string lea start_memory, a1 ; Calculate number of bytes occupied by the lea speed_3_memory, a0 ; instructions in the loop, then print. bsr.s calculate_and_print_requisite_memory wait_for_keypress: move.w #8, -(sp) ; Function = c_necin = GEMDOS $8. trap #1 ; GEMDOS call. addq.l #2, sp ; Reposition stack pointer at top of stack. terminate: move.w #0, -(sp) trap #1 print_string: ; Expects address of string to be in A0. pea (a0) ; Push address of string onto stack. move.w #9, -(sp) ; Function = c_conws = GEMDOS $9. trap #1 ; GEMDOS call addq.l #6, sp ; Reset stack pointer to top of stack. rts convert_and_print_time: sub.l d5, d0 ; Subtract start time from end time. mulu #5, d0 ; Convert to milliseconds. move.l d0, d1 ; Convert to ASCII decimal. trap #4 bsr print_string lea time_label, a0 bsr print_string rts calculate_and_print_requisite_memory: suba.l a1, a0 move.l a0, d1 ; Transfer requisite memory for trap call. print_speed_1_requisite_memory: trap #4 ; Returns address of decimal string in A0. bsr print_string lea memory_label, a0 bsr print_string rts data salt: dc.b $B,"PRG_5AP.TOS",$D,0,0,0,0 heading: dc.b $D,$A,"CMD_TEST Execution Results",$D,$A,$D,$A,0 speed_1_msg: dc.b " SPEED_1 algorithm time: ",0 speed_2_msg: dc.b " SPEED_2 algorithm time: ",0 speed_3_msg: dc.b " SPEED_3 algorithm time: ",0 time_label: dc.b " milliseconds",$D,$A,0 speed_1_memory_msg: dc.b $D,$A," SPEED_1 algorithm requisite memory: ",0 speed_2_memory_msg: dc.b " SPEED_2 algorithm requisite memory: ",0 speed_3_memory_msg: dc.b " SPEED_3 algorithm requisite memory: ",0 memory_label: dc.b " bytes",$D,$A,0 align bss program_name_1: ds.l 4 ; Program name buffer for SPEED_1 algorithm. filename_1: ds.l 4 ; Filename buffer for SPEED_1 algorithm. input_line: ds.b 1 ; Command line buffer for SPEED_2 algorithm. program_name_2: ds.b 15 ; Program name buffer for SPEED_2 algorithm. output_line_2: ds.b 1 ; Second command line buffer for SPEED_2. filename_2: ds.b 15 ; Filename buffer for SPEED_2 algorithm. program_name_ptr: ds.l 4 ; Pointer to filename in command line for SPEED_3. output_line_3: ds.b 1 ; Command line buffer for SPEED_3 algorithm. filename_3: ds.b 15 ; Filename buffer for SPEED_3 algorithm. ds.l 96 ; Program stack. stack: ds.l 0 ; Address of program stack. program_end: ds.l 0 end CMD_TEST Execution Results SPEED_1 algorithm time: 830 milliseconds SPEED_2 algorithm time: 350 milliseconds SPEED_3 algorithm time: 300 milliseconds SPEED_1 algorithm requisite memory: 60 bytes SPEED_2 algorithm requisite memory: 58 bytes SPEED_3 algorithm requisite memory: 52 bytes Authenticating The Results Because the final configuration of the utility will depend primarily on the results displayed by CMD_TEST.TOS, the validity of those results must be beyond question. I have used three validation techniques. First, I single- stepped through each instruction. Then I verified the data written by the program to its basepage command line and bss section. Finally, I compared the execution times reported to values calculated using the Motorola Programmer's Reference Manual. Figure 5.1 is a partial disassembly of program 20 as it was in memory after execution. There you can see that the basepage command line contains the salt data and has been altered as specified by the SPEED_3 command line processing algorithm. To wit: the carriage return has been replaced by a NULL. Also evident are the strings stored in the bss segment by the three algorithms. Table 5.1 lists the relevant declared variables, their lengths and their addresses in the disassembly listing. Table 5.1 Match the variable names listed and their addresses to the data shown in the disassembly listing. Variable Length Address program_name_1: ds.l 4 $0919C8 filename_1: ds.l 4 $0919D8 input_line: ds.b 1 $0919E8 program_name_2: ds.b 15 $0919E9 output_line_2: ds.b 1 $0919F8 filename_2: ds.b 15 $0919F9 program_name_ptr: ds.l 4 $091A08 output_line_3: ds.b 1 $091A0C filename_3: ds.b 15 $091A0D Figure 5.1. Partial disassembly of CMD_TEST.TOS after execution, showing the basepage command line, and the command line relevant portion of the bss section. Table 5.2 lists the instructions used in each of the command line processing algorithms and their required execution clock periods as specified in the Motorola manual. I want to give you the values I calculated for two reasons; first, to show you that it can be done from the tables in the Motorola guide; second, to serve as verification that program 20 performs its task accurately. If you desire, you can use this table to practice your interpretation of the data in the Motorola tables. A short tutorial follows the table. Table 5.2 The instructions used in command line processing algorithms of SPEED_1, SPEED_2 and SPEED_3. Instruction Clock Periods speed_1_loop: lea $80(a5), a4 8 move.b (a4)+, d0 8 lea program_name_1, a3 8 subq.b #1, d0 4 ext.w d0 4 Total = sum of 5 = 32 fetch_character: (There are 11 characters.) move.b (a4)+, (a3)+ 12 dbra d0, fetch_character 10/14 Total = 11(12) + 10(10) + 14 = 246 move.b #0, (a3) 12 create_file_name: lea filename_1, a4 8 lea program_name_1, a3 8 Total = sum of 3 = 28 copy_name: (There are 7 characters.) move.b (a3)+, (a4)+ 12 cmpi.b #$2E, (a3) 12 bne.s copy_name 10/8 Total = 7(24) + 6(10) + 8 = 236 move.b #$2E, (a4)+ 12 move.b #$44, (a4)+ 12 move.b #$41, (a4)+ 12 move.b #$54, (a4)+ 12 move.b #0, (a4) 12 Total = sum of 5 = 60 Algorithm total = 32 + 246 + 28 + 236 + 60 = 602 speed_1_memory: dbra d4, speed_1_loop speed_2_loop: lea input_line, a3 8 lea output_line_2, a4 8 movem.l $80(a5), d0-d3 = 16+8(4) = 48 movem.l d0-d3, (a3) = 8+8(4) = 40 movem.l d0-d3, (a4) 8+8(4) = 40 move.b $80(a5), d0 12 ext.w d0 4 move.b #0, 1(a3,d0.w) 18 move.b #0, 1(a4,d0.w) 18 insert_filename_suffix: move.b #$44, -2(a4,d0.w) 18 move.b #$41, -1(a4,d0.w) 18 move.b #$54, 0(a4,d0.w) 18 Algorithm total = sum of 12 = 250 speed_2_memory: dbra d4, speed_2_loop speed_3_loop: movea.l a5, a3 (not counted) start_memory: lea output_line_3, a4 8 movem.l (a3), d0-d3 = 12+8(4) = 44 movem.l d0-d3, (a4) 8+8(4) = 40 move.b (a3)+, d0 8 ext.w d0 4 move.b #0, 1(a4,d0.w) 18 lea program_name_ptr, a0 8 move.l a3, (a0) 12 move.b #0, 0(a3,d0.w) 18 _insert_filename_suffix: move.b #$44, -2(a4,d0.w) 18 move.b #$41, -1(a4,d0.w) 18 move.b #$54, 0(a4,d0.w) 18 Algorithm total = sum of 12 = 214 speed_3_memory: dbra d4, speed_3_loop Instruction Execution Times Tutorial In my copy of the M68000 Programmer's Reference Manual, which may not be the same as yours, MC68000 instruction execution times are presented in Appendix D. Times for the MC68008 are presented in Appendix E, and those for the MC68010/MC68012 are in Appendix F. I am pointing out the locations for the other processors so that you can avoid them. When you are looking for MC68000 times, make sure that you are doing so in Appendix D. The Introduction to the appendix contains information concerning wait states that is not applicable to the Atari ST. The only thing in the introduction which concerns us are the notes stating that the instruction execution times are given in terms of external (system) clock periods and that the number of periods includes instruction fetch and all applicable operand fetches and stores. The ST's clock period is 1 divided by 8,000,000 = .000000125 second = 1.25 x 10-7 sec, because the system operates with an 8 megahertz (mhz) clock. The first table, D-1, lists the Effective Address Calculation Times for the addressing modes. This table is one to which you must refer back when so directed by other tables. Reference is made to this table via a + sign following the number of clock periods given for a particular instruction. The reference indicates that you should go back to table D-1, fetch the appropriate time for the appropriate addressing mode and data length (byte,word or long) and add that time to the number of clock periods preceding the + sign. The other tables list base times for the instructions; I say base because of the need to add an effective address time for many instructions. The tables are arranged so that data is presented for groups of similar instructions. You use these tables by finding the one which lists the instruction of interest; then, if a source operand is involved, you locate the row specified by the source operand, if there are rows of source operands; then, if a destination operand is involved, you locate the column specified by the destination operand, if there are columns of destination operands; then, locate the data at the row- column intersection; then, if a + sign follows the data, go back to table D-1, fetch the effective address time and add it to the data. When you have done all of that, you will have the instructions execution time in clock periods. Not all instructions contain both a source operand and a destination operand. Not all of the tables explicitly reference both operand types. Not all tables list destination operands in columns; some of them list source operands in columns. Therefore, your mind must be on what you are doing when you are reading the tables. For example, times for ADD/ADDA, AND, CMP/CMPA, DIVS, DIVU, EOR, MULS, MULU, OR and SUB are listed in table D-4. Following each time given is a + sign, therefore, an effective address time from table D-1 is needed for each item in the table. You might ask, "Does the data in table D-4 pertain to source operands or destination operands? Does the reference to table D-1 pertain to source operands or destination operands?". The answer to both questions is, "Yes.". Because these instructions can be written so that the effective addresses which head the columned data in table D- 4 can be either source or destination operands. To see what I mean, look at the Assembler Syntax for the ADD instruction. There you see the following notation: ADD <ea>,Dn ADD Dn,<ea>. The reason that Motorola's use of the term effective address is confusing is that, in their manual, all addressing modes are discussed as if the location specified in operands are somehow implied by operand format. It seems as though the authors of the manual had originally intended that the term effective address be used to indicate a location specified by an operand to be ultimately found in memory external to the processor, in contrast to processor registers, which are internal addresses. But, in fact, even when discussing Register Direct Modes, the manual states, "These effective addressing modes specify that the operand is in one of the 16 multifunction registers.". So, I say, let it all be effective addresses, as the authors apparently decided to do. But then the descriptive effective is redundant, and it renders the instruction, add effective address calculation time, which is indicated by the + sign, ineffective. What that instruction should instruct one to do is this: for the appropriate operand, add the additional time indicated in table D-1 for the appropriate addressing mode. Of course, one must determine the appropriate operand and the appropriate addressing mode. But this must be done regardless of terminology. However, the manual does not make that clear, nor does it indicate the manner in which it can be accomplished. I shall. Using instructions selected from those listed in table 5.2, I will conclude this tutorial by showing you how I obtained the execution time for those instruction, then I will show you how to obtain the time for at least one instruction listed in the Motorola tables which none of the instructions in table 5.2 access. I think that the exploration will be sufficiently comprehensive. lea input_line, a3 The lea instruction is found in table D-10, JMP, JSR, LEA, PEA, and MOVEM Instruction Execution Times. For all of these instructions, the destination operand is implicit: for JMP and JSR the destination is the program counter (PC); for LEA the destination is an address register; for PEA the destination is a stack; for MOVEM (M->R, memory to registers) the destination is a register group; for MOVEM (R->M, registers to memory) the destination is a group of memory addresses. This means that the columns containing times in the table refer to source operands. The source operand for lea input_line, a3 is a label, therefore, the addressing mode used might seem to be absolute, but the program in which the instruction is used was assembled in AssemPro's PC-relative mode. Therefore, the addressing mode is program counter with displacement. The execution time for the instruction is found where the LEA row intersects with the d16(PC) column. The time is 8 clock periods. Eight clock periods translates to 8(.000000125 sec) = .000001 sec = 1 microsecond = .001 msec. As you can see, this is a very short period of time. It is not possible to measure times that are this short with a clock that has a resolution of 5 msec. That's why it is necessary to execute instructions and entire algorithms within loops that extend the time period being measured. A time period being measured with the system clock should be sufficiently long to render the 5 msec resolution of the clock insignificant. Because the loops which execute the algorithms many times contain branching overhead, it is easier to compare relative execution times, instead of absolute execution times, when performing the comparisons with computer generated data. When absolute times are desired, it is easier to compute them using the tables in the Motorola manual. lea $80(a5), a4 Here the addressing mode of the source operand is address register indirect with displacement. The execution time for the instruction is found at the point of intersection specified by the LEA row and the d16(An) column. The time is 8 clock periods. movem.l $80(a5), d0-d3 The row labeled MOVEM M->R is specified for this instruction. Furthermore, this row is divided into two subrows: Word and Long. The instruction specifies a longword operation, so the Long subrow must be used. The source operand uses address register indirect with displacement addressing = d16(An). For this instruction, the data found at the intersection of the specified row and column is not the instruction execution time. Instead, there is a formula from which the execution time must be calculated. The parameter n specified by the formula is a variable for the number of registers specified in the instruction. In this case the transfer from memory is to use 4 registers. The instruction execution time is 16 + 8(4) = 48 clock periods. movem.l d0-d3, (a3) Refer to the row labeled MOVEM R->M in the D-10 table. The formula shown at the intersection of the Long subrow and (An) column is similar to that for the MOVEM M->R instruction. The instruction execution time is 8 + 8(4) = 40 clock periods. move.b (a4)+, d0 The execution times for move instructions are contained in two tables. The first table, D-2 (Move Byte and Word Instruction Execution Times), must be used for this instruction because a byte operation is specified. The addressing mode used by the source operand is address register indirect with postincrement. That used in the destination operand is data register direct. The instruction time of 8 clock periods is found at the intersection of the (An)+ row and the Dn column. Note that these tables are used for MOVE and MOVEA instructions. subq.b #1, d0 The table to use is D-5 (Immediate Instruction Execution Times). All of the instructions in this table require a source operand which uses the immediate data addressing mode. The three columns in the table specify permissible destination operands. In this case, the instruction specifies data register direct. At the intersection of the SUBQ row and op #, Dn column, for a byte size operation, the time given is 4 clock periods. ext.w d0 This instruction found in table D-12 (Miscellaneous Instruction Execution Times). Although there are two subrows shown for the EXT row, the times for both are identical. This instruction requires no source operand, and the time is simply 4 clock periods. dbra d0, fetch_character The DBcc instruction is used to control loop exits. Therefore, we are most often concerned with multiple executions of the instruction and with a sum of execution times. Also, the execution time of a single DBcc execution depends on the state of the condition code register (CCR) and the state of the loop counter when loop exit takes place. Loop exit is forced when the DBcc condition code becomes true or when the value in the counter becomes negative. Refer to table D-9 (Conditional Instruction Execution Times). Note that the DBcc instruction is the only instruction in the table for which the displacement between the instruction and the destination does not affect the execution time. Depending on the manual you are using, the DBcc row may be divided into 2 or 3 subrows. Figure 5.4 shows the row divided into 3 subrows. Figure 5.4. Subrows for the DBcc Instruction. Displacement Branch Taken Branch Not Taken cc true - 12 cc false, Count Not Expired 10 - cc false, Counter Expired - 14 The information contained in the second and third rows can be combined so that only one row need be used to express it. In that case, the second row would be: cc false 10 14 This makes sense because when cc is false the branch can be taken only if the count has not expired, while it cannot be taken if the count has expired. Except for the DBT instruction, which never branches and never decrements, for any condition specified in a DBcc instruction (For DBRA = DBF, the condition is always false.), a branch will be taken if the condition is true or if the value in the counter is not negative, and the execution time for the instruction will be 10 clock periods. If the condition becomes true, a branch will not be taken, and the execution time for the instruction will be 12 clock periods, regardless of the value in the counter. If the value in the counter becomes negative before the condition becomes true, then the execution time for the instruction will be 14 clock periods. For a counter value n, the DBcc instruction will be executed N times if exit from the loop takes place because the condition becomes true and the sum of DBcc instruction execution times will be (N)(10) + 12, where N is the number of branches which actually took place, not the value stored in the counter. The sum of execution times will be (n)(10) + 14 if exit from the loop takes place because the counter becomes negative. For the instruction being used as an example, n is equal to one less than the number of characters in the string being copied. There are 11 characters, so n equals 10 because the value in the counter must be one less than the number of times the loop is to be executed. The condition for the DBRA instruction is never true, so exit from the loop can only take place when the value in the counter becomes negative. The sum of execution times for the instruction is (10)(10) + 14 = 114 clock periods. cmpi.b #$2E, (a3) This instruction is found in table D-5 (Immediate Instruction Execution Times). This table was discussed in the section under subq.b #1, d0. The source operand must use, and does use, the immediate data addressing mode. Unlike that of the previously referenced instruction, the destination operand of this one uses the address register indirect addressing mode. And at the intersection of the CMPI.B row and op #, M column, we find that the instruction execution time of 8 clock periods is following by a + sign. The + sign indicates a reference to table D-1 (Effective Address Calculation Times). But what value is it that we seek there? Just under the heading for table D-5 is the statement that implies this information. The statement tells us that the time shown at the intersection is that which is required to fetch the immediate operand. We can deduce that the time we seek is that for the addressing mode of the destination operand. In table D-1, at the (An) row/byte size operation intersection we find the value 4, which means that we must add 4 clock periods to the 8 shown in table D-5. Thus the instruction execution time is 12 clock periods. bne.s copy_name The Bcc instruction is listed in table D-9, the same table which lists the DBcc instruction. The Bcc instruction also has a Branch Taken and a Branch Not Taken column; and like the DBcc instruction, the Bcc instruction's execution time depends on the state of the CCR; but unlike the DBcc instruction, it also depends on the size of the displacement between the instruction and the branch destination. For the instruction being discussed, the displacement is short = byte size. For a byte size displacement the execution time is 10 clock periods for a branch taken, 8 clock periods for a branch not taken. There are two instructions within the SPEED_1 copy_name loop, each of which require 12 clock periods per execution. The body of the loop is executed 7 times, and the bne.s instruction is executed 7 times. But the branch is taken only 6 times. The sum of the Bcc instruction execution times will be 6(10) + 8 = 68 clock periods. add.l d0, d5 Refer to table D-4, Standard Instruction Execution Times. There are two subrows, labeled according to the size of the operation. At the intersection of the Long subrow and the op<ea>, Dn column, there is this notation: 6(1/0)+**. Referring to the notes under the table, we find that the + means that we must fetch the address calculation time for the source operand, and the ** means that the 6 must be increased to 8 if the addressing mode of the source operand is register direct or immediate. Well, the addressing mode of the source operand is register direct, so the 6 becomes 8. Glancing back at table D-1, we see that the address calculation time for the register direct addressing mode is 0. Therefore, the execution time for the instruction is simply 8 clock periods. asl #2, d5 The execution times for SHIFT and ROTATE instructions are listed in table D-7. Using the formula shown at the intersection of the ASL instruction's Long subrow and the Register column, the calculated execution time for the example is 8 + 2(2) = 12 clock periods. Here I have replaced n with the immediate value of the source operand. seq (a0) Refer to table D-6, Single Operand Instruction Execution Times. The Scc instruction row is divided into two subrows labeled Byte, False and Byte, True. So we see that the execution time depends on the state of the condition code, which is eq in the example, if the addressing mode of the operand is register direct. For all other modes, the execution time is 8 clock periods plus the address calculation time obtained from table D-1. The example operand's addressing mode is address register indirect, and in table D-1 the address calculation time for that mode is 4 clock periods for a byte size operation. The instruction execution time is 8 + 4 = 12 clock periods. bset #5, (sp) Table D-8 lists the execution times for the Bit Manipulation instructions. For all of the instructions listed in the table, the bit to be manipulated is specified by the source operand; the location of the bit to be manipulated is specified by the destination operand. There are two major columns in this table: Dynamic and Static. The Dynamic major column is used if the number of the bit to be manipulated is specified with the contents of a register; the Static major column is used if the number of the bit to be manipulated is specified with immediate data, such as shown in the example. Each of the major columns is composed of two minor columns. A Register minor column is used if the bit to be manipulated resides in a register, a Memory minor column is used if the bit to be manipulated resides in memory external to the processor. The bit to be manipulated in the example resides in a stack, which is memory external to the processor. An operation size indicator for any of the instructions shown in this table would be redundant because the size of the operation must be long if the bit to be manipulated resides in a register and it must be byte otherwise. So at the intersection of BSET's Byte subrow and Static-Memory column we find the notation: 12(2/1)+. Fetching the address calculation time for (sp) = (An) from table D-1, which is 4 for a byte size operation, and adding it to the 12, we calculate the example instruction time as 16 clock periods. This concludes my Instruction Execution Times Tutorial. I have not dealt with table D-13, which lists the single instruction MOVEP, because this instruction is a little tricky. I will use this instruction in a later chapter, and I hope to remember to discuss its execution time then. Neither have I dealt with table D-14, which lists Exception Processing Execution Times because they are so easily derived. For example, the execution time for any trap #n instruction is simply 34 clock periods. Execution Speed Ratios The execution speed ratios of figure 5.2 are obtained from the results of one execution of CMD_TEST.TOS. On subsequent executions the results for SPEED_1 were sometimes 835, and the results for SPEED_3 were sometimes 305. At times, both differences appeared simultaneously. These differences for multiple executions are to be expected because the system variable _hz_200 (memory location $4BA) is incremented only every 200hz, which means that the period between increments is 1/200 = .005 second = 5 milliseconds. This means that the variable measures time with a resolution of 5 milliseconds (msec). Unexpectedly, the time for SPEED_2 rarely varied. At first, that made me wonder if I had made an error in its algorithm as it is in CMD_TEST.TOS, but I have checked extensively and found nothing wrong. However, I mention my concern, just so you'll know, although it does not really affect the decision concerning which algorithm to choose for SPEEDTST.TTP. Figure 5.2. CMD_TEST.TOS execution speed ratios. As you can see, SPEED_2's command line processing algorithm is about 2.37 times faster than SPEED_1's, while SPEED_3's is about 2.77 times faster. SPEED_1 830 ------- = --- = 2.37 SPEED_2 350 SPEED_1 830 ------- = --- = 2.77 SPEED_3 300 SPEED_2 350 ------- = --- = 1.17 SPEED_3 300 The execution speed ratios shown in figure 5.3 are obtained from the data in table 5.2. I have also checked and rechecked this data many times, but I warn you not to trust me, although I trust the data. Actually, the ratios below agree very closely with the those of figure 5.2, especially when one considers the 5 msec resolution of the clock that is being used to measure execution time. In any case, we are much more interested in relative execution speeds than we are in absolute speeds. Figure 5.3. Execution speed ratios calculated from instruction execution timing information in the Motorola manual. SPEED_1 602 ------- = --- = 2.41 SPEED_2 250 SPEED_1 602 ------- = --- = 2.81 SPEED_3 214 SPEED_2 250 ------- = --- = 1.17 SPEED_3 214 Putting the Pieces Together The final algorithm is prepared by extracting the best algorithms from the three models, and installing the instructions implemented by custom trap #9. All of the programs of the series, SPEED_1.TTP, SPEED_2.TTP and SPEED_3.TTP, as well as programs PRG_5AP.TOS, CMD_TEST.TOS, TRAPS.S and TRAP_9.S along with all of the execution results are included in the documentation package for program 21. In addition, program 21 contains some documentation that was not previously disclosed. Program 21. The final algorithm. ; Program Name: SPEEDTST.S ; Version: 1.006 ; Assembly Instructions: ; Assemble in "PC-relative" mode and save with a TTP extension. ; Function: ; Spawn the TOS or PRG process typed on the command line. Create a disk ; file which is to be identified by the name of the spawned program with a ; DAT suffix. The disk file is to reside in the same directory as does the ; spawned process. ; Calculate the spawned program's load and execution times and store them ; in the file. If the spawned process directs output to the video screen via ; GEMDOS function $9, redirect that output to the file. ; Execution Instructions: ; SPEEDTST.TTP will not execute unless the custom traps in program ; TRAPS.PRG have previously been installed. ; Execute from the desktop. Type the name of an executable file which ; has a TOS or PRG extension on SPEEDTST.TTP's input parameter line. The ; name of the program you type on the parameter line must be in the same ; directory as is SPEEDTST.TTP. The program must terminate with GEMDOS ; function $4C, and, via that function, it must pass to SPEEDTST.TTP the ; word length portion of the value that was in memory location $4BA ; immediately after it was loaded. ; The longword value in $4BA can be obtained by invoking custom trap #3 ; (get_time). SPEEDTST.TTP uses the word length portion of that value, ; which is returned in D0 by GEMDOS $4C, to calculate the spawned program's ; load and execution times. ; If the spawned program contains any instructions that cause it to pause, ; such as those that wait for a keypress or some other event, those should be ; commented out, and the program should be assembled especially for the speed ; test. Otherwise the execution time computed by SPEEDTEST.TTP will include ; the time that the spawned program was waiting for the event to occur. ; If custom trap #8 is used to terminate the spawned program, the trap ; will execute a wait_for_keypress algorithm when the program is executed from ; the desktop, but it will omit the wait algorithm when the program is spawned ; by SPEEDTST.TTP. In addition, trap #8 will return the after-load value to ; SPEEDTST.TTP and terminate the spawned program with GEMDOS function $4C. ; Both trap #8 and SPEEDTST.TTP require that the spawned program be ; initialized with custom trap #6 or a similar algorithm. See TRAPS.S for ; details about custom traps #6 and #8. release_excess_memory: lea -$82(pc), a3 ; Put "command line" address in A3. lea -$80(a3), a1 ; Put "basepage" address in A1. lea program_end, a0 ; Put "end of program" address in A0. trap #6 ; Calculate program size and release memory. ; NOTE: A local stack is not declared in PRG_5AP.TOS. Because of the long ; string that is printed by that program, this program will bomb when ; it spawns PRG_5AP.TOS, if a local stack is not declared here. lea stack, a7 ; Point A7 to this program's stack. process_command_line: lea command_line, a4 ; Fetch location to contain command line. movem.l (a3), d0-d3 ; Move 16 bytes of command line to 4 registers. movem.l d0-d3, (a4) ; Move them to address "command_line". move.b (a3)+, d0 ; Fetch command line ASCII character count. ext.w d0 ; Extend to word for next instruction. move.b #0, 1(a4,d0.w) ; Store a null at end of string. lea program_name, a0 ; Fetch address of pointer to command line. move.l a3, (a0) ; Store address of command line string at ; pointer. move.b #0, 0(a3,d0.w) ; Replace $0D at end of command line input ; in basepage with a NULL. insert_filename_suffix: move.b #$44, -2(a4,d0.w) ; Insert letter 'D'. move.b #$41, -1(a4,d0.w) ; Insert letter 'A'. move.b #$54, 0(a4,d0.w) ; Insert letter 'T'. create_file: move.w #0, -(sp) ; File attribute = read/write. pea filename ; Will be name of spawned process + .DAT. move.w #$3C, -(sp) ; Function = f_create = GEMDOS $3C. trap #1 ; File handle is returned in D0. addq.l #8, sp lea file_handle, a0 ; Store returned file handle. move.w d0, (a0) redirect_output: ; Exchange file handle with screen's handle. move.w file_handle, -(sp) ; This is the disk file's handle. move.w #1, -(sp) ; This is the video screen's handle. move.w #$46, -(sp) ; Function = f_force = GEMDOS $46. trap #1 addq.l #6, sp prepare_stack_for_load_and_execute_program: pea environ_string pea command_string pea (a3) ; Push address of program name string. move.w #0, -(sp) move.w #$4B, -(sp) ; Function = GEMDOS $4B = p_exec. get_start_time: lea start_time, a3 ; Fetch address of variable "start_time". trap #3 ; Returns value of system clock in D0. move.w d0, (a3) ; Save start time. load_and_execute_program: trap #1 move.w d0, d3 ; Copy after-load value to D3 for calculation. get_end_time: trap #3 ; Returns value of system clock in D0. move.w d0, d5 ; Copy to D5 for calculation. sub.w d3, d5 ; Subtract after-load time from end time. ext.l d5 ; Extend to 32 bits. reposition_stack_pointer: lea $10(sp), sp get_drive: move.w #$19, -(sp) ; Function = dgetdrv = GEMDOS $19. trap #1 ; Returns 0 for drive A, 1 for B, etc. addq.l #2, sp add.b #$41, d0 ; Add ASCII value for A to compute ASCII lea drive, a0 ; letter code for the drive value returned. move.b d0, (a0) ; Save drives ASCII leter code. print_heading: lea heading, a0 bsr print_string lea program_name, a0 ; Fetch address of program name string. movea.l (a0), a0 bsr print_string print_drive_for_spawned_program: lea drive_msg, a0 bsr print_string compute_load_time: lea load_time_msg, a0 bsr print_string lea start_time, a3 sub.w (a3), d3 ; Subtract start time from after-load time. ext.l d3 ; Extent to 32 bits. multiply_by_five: ; Convert to milliseconds. move.l d3, d0 ; Save a copy to add. asl.l #2, d3 ; Shift to multiply by 4. add.l d0, d3 ; To complete multiplication by 5. print_load_time: cmpi.l #999, d3 ; If load time is less than 1000, then bgt no_space ; print a leading blank space for output lea space, a0 ; alignment. bsr print_string cmpi.l #99, d3 ; If load time is less than 100, then bgt no_space ; print another leading blank space. lea space, a0 bsr print_string no_space: move.l d3, d1 ; Copy load time to D1 for decimal conversion. trap #4 ; Returns address of decimal string in A0. bsr.s print_string lea units_label, a0 bsr.s print_string compute_execution_time: ; D5 already contains the execution time. lea execute_time_msg, a0; Here, it must only be multiplied by 5 to bsr.s print_string ; be converted to milliseconds. move.l d5, d0 ; Save a copy to add. asl.l #2, d5 ; Shift to multiply by 4. add.l d0, d5 ; To complete multiplication by 5. print_execution_time: cmpi.l #999, d5 ; If execute time is less than 1000, then bgt _no_space ; print a leading blank space for output lea space, a0 ; alignment. bsr print_string cmpi.l #99, d5 ; If execute time is less than 100, then bgt _no_space ; print another leading blank space. lea space, a0 bsr print_string _no_space: move.l d5, d1 ; Copy execute time for decimal conversion. trap #4 ; Returns address of decimal string in A0. bsr.s print_string lea units_label, a0 bsr.s print_string close_file: move.w file_handle, -(sp) move.w #$3E, -(sp) ; Function = fclose = GEMDOS $3E. trap #1 addq.l #4, sp terminate: move.w #0, -(sp) trap #1 print_string: ; Expects address of string to be in A0. pea (a0) ; Push address of string onto stack. move.w #9, -(sp) ; Function = c_conws = GEMDOS $9. trap #1 ; GEMDOS call addq.l #6, sp ; Reset stack pointer to top of stack. rts data space: dc.b " ",0 heading: dc.b $D,$A,"SPEEDTST.TTP Execution Results",$D,$A dc.b "for ",0 drive_msg: dc.b ", loaded from drive: " drive: dc.b "A",$D,$A,0 load_time_msg: dc.b $D,$A," Load time: ",0 execute_time_msg: dc.b " Execution time: ",0 units_label: dc.b " milliseconds",$D,$A,0 environ_string: dc.b "TERM",0 command_string: dc.b 0 align bss start_time: ds.w 1 ; Value in $4BA just before spawning. program_name: ds.l 1 ; Pointer to name in basepage command line. file_handle: ds.w 1 ; Handle for the filename below. command_line: ds.b 1 ; Unused character count will go here. filename: ds.b 15 ; File name for redirected output. ds.l 96 ; Program stack. stack: ds.l 0 ; Address of program stack. program_end: ds.l 0 end SPEEDTST.TTP Execution Results PRG_5AP.TOS Execution Results When executed from the desktop, this program will print this string on the video screen and pause for a keypress. But, when this program is spawned by SPEED_1, SPEED_2, SPEED_3 or SPEEDTST, the string will be stored in a file named PRG_5AP.DAT and the program will not pause for a keypress. SPEEDTST.TTP Execution Results for PRG_5AP.TOS, loaded from drive: G Load time: 40 milliseconds Execution time: 685 milliseconds The Second Utility I conclude this chapter with a utility that spawns a program and create a file for redirected output, but which does not measure load and execution times. This program is used when I want to save the output from a program in a disk file for documentation, leisurely viewing or for comparison with the output of one or more other programs. Program 22 is simply a subset of program 21. Program 22. A program that simply spawns a process and saves its redirected output in a disk file. ; Program Name: SPAWN.S ; Version 1.003 ; Assembly Instructions: ; Assemble in "PC-relative" mode and save with a TTP extension. ; Program Function: ; Spawn the TOS or PRG process typed on the command line. Create a disk ; file which is to be identified by the name of the spawned program with a ; DAT suffix. The disk file is to reside in the same directory as does the ; spawned process. ; If the program to be executed has any halt or wait instructions, such ; as wait for a keypress, etc., you must remember that execution of the ; spawned process will not terminate until those conditions are satisfied. release_excess_memory: lea -$82(pc), a3 ; Put "command line" address in A3. lea -$80(a3), a1 ; Put "basepage" address in A1. lea program_end, a0 ; Put "end of program" address in A0. trap #6 ; Calculate program size and release memory. lea stack, a7 ; Point A7 to this program's stack. process_command_line_parameters: lea command_line, a4 ; Fetch location to contain command line. movem.l (a3), d0-d3 ; Move 16 bytes of command line to 4 registers. movem.l d0-d3, (a4) ; Move them to address "command_line". move.b (a3)+, d0 ; Fetch command line ASCII character count. ext.w d0 ; Extend to word for next instruction. move.b #0, 1(a4,d0.w) ; Store a null at end of string. lea program_name, a0 ; Fetch address of pointer to command line. move.l a3, (a0) ; Store address of command line string at ; pointer. move.b #0, 0(a3,d0.w) ; Replace $0D at end of command line input ; in basepage with a NULL. insert_filename_suffix: move.b #$44, -2(a4,d0.w) ; Insert letter 'D'. move.b #$41, -1(a4,d0.w) ; Insert letter 'A'. move.b #$54, 0(a4,d0.w) ; Insert letter 'T'. create_file: move.w #0, -(sp) ; File attribute = read/write. pea filename ; Will be name of spawned process + .DAT. move.w #$3C, -(sp) ; Function = f_create = GEMDOS $3C. trap #1 ; File handle is returned in D0. addq.l #8, sp lea file_handle, a0 ; Store returned file handle to be used when move.w d0, (a0) ; the file is closed later. redirect_output: ; Exchange file handle with screen's handle. move.w d0, -(sp) ; This is the disk file's handle. move.w #1, -(sp) ; This is the video screen's handle. move.w #$46, -(sp) ; Function = f_force = GEMDOS $46. trap #1 addq.l #6, sp load_and_execute_program: pea environ_string pea command_string pea (a3) ; A3 contains address of program name string. move.w #0, -(sp) ; Load and Go option. move.w #$4B, -(sp) ; Function = GEMDOS $4B = p_exec. trap #1 lea $10(a7), sp ; Reposition stack pointer. close_file: move.w file_handle, -(sp) move.w #$3E, -(sp) ; Function = GEMDOS $3E = f_close. trap #1 addq.l #4, sp terminate: move.w #0, -(sp) trap #1 data environ_string: dc.b "TERM",0 command_string: dc.b 0 align bss file_handle: ds.w 1 ; Handle for the disk file named below. command_line: ds.b 1 ; Unused character count will go here. filename: ds.b 15 ; File name for redirected output. program_name: ds.l 1 ; Pointer to name in basepage command line. ds.l 96 ; Program stack. stack: ds.l 0 ; Address of program stack. program_end: ds.l 0 end Execution results for PRG_5AP.TOS as a spawned process. PRG_5AP.TOS Execution Results When executed from the desktop, this program will print this string on the video screen and pause for a keypress. But, when this program is spawned by SPEED_1, SPEED_2, SPEED_3 or SPEEDTST, the string will be stored in a file named PRG_5AP.DAT and the program will not pause for a keypress. Conclusion Performance testing and utilities with which such testing may be accomplished has been the subject of this chapter. But the material in this chapter represents only a beginning. Software testing as a subject is complicated enough, but implementing such testing is a horrendous task. At this point, I have provided you with a few simple tools and a tutorial which should assist you in calculating instruction execution times. I have said that single- stepping through a program with AssemPro's debugger is one method I use to verify a program's performance. The debugger permits one to view registers and memory locations while tracing through a program in this manner. For short, uncomplicated programs, if you are able to keep your wits sharp while doing so, this is a viable method of verification. But many programs cannot be tested within the debugger. Furthermore, it is virtually impossible to keep track of register and memory activity for larger programs. Therefore, programs which do this automatically will be introduced in a later chapter. For now, it is time to take advantage of the two utilities introduced here to investigate the questions raised by material in earlier chapters. I do this in chapter 6. There I will compare programs assembled in each of three assembly modes, and I will compare the performance of certain instructions, so that you can see early on why I choose to use them in future programs.