home *** CD-ROM | disk | FTP | other *** search
Text File | 1993-10-23 | 89.1 KB | 2,172 lines |
- Atari ST Machine Specific Programming In Assembly
-
- Chapter 6: Selections Based On Speed
-
-
- Reducing Program Loading Time
-
- As I was concluding chapter 3, I intended to indicate
- that, in many ways, this chapter is an extension of that
- one. For example, the first thing that I shall do here is
- to resume the discussion of assembly modes. As I stated
- there, the first step involved in rapid execution is rapid
- loading. A program cannot even begin run until it has been
- moved from disk storage to ram.
- For programs which simply load, execute and terminate,
- without spending much time in ram, the time required to load
- the program may be a significant portion of the total time
- involved in program execution. Many of the programs that
- are executed from an AUTO folder during boot are of this
- type. It seems reasonable, therefore, that discussions
- which involve the reduction of loading time precede
- discussions involving the reduction of execution time.
- As I mentioned in program SPEED_1's documentation, the
- time required to load a particular program from disk to ram
- depends on the assembly mode which produced the code; the
- type of drive from which it is loaded, hard disk or floppy;
- the method used to format the disk, such as ST normal,
- Double Click's DC Formatter or David Small and Dan Moore's
- Mega-Twister; the position of the file on the disk, which
- could include fragmentation; and, if the program is spawned
- by another, the distance between parent and child, which
- could include intermedia displacement.
-
- The Effects of Assembly Modes
-
- Programs 23 and 24 have been prepared for use in
- comparing the load and execution times of a program
- assembled in three assembly modes. Program 23 should be
- assembled in the Relocatable mode to produce PRG_2CC.TOS.
- Program 24 should be assembled in AssemPro's PC-relative
- mode to produce PRG_2CP.TOS, then the name of the source
- should be changed to PRG_2CR, while it is still in the
- editor, so that it can be assembled in the Relocatable mode
- to produce PRG_2CR.TOS.
- I obtained the load and execute times for the three TOS
- programs using the following method to isolate the
- experiment from the intermedia displacement variables
- discussed above. Execution results follow the appropriate
- listings.
-
- 1. Copy SPEEDTST.TTP to a blank hard disk partition or
- floppy disk.
-
- 2, Copy PRG_2CC.TOS to the same hard disk partition or
- floppy disk.
-
- 3. Execute SPEEDTST.TTP and type PRG_2CC.TOS on the
- command line.
-
- 4. Copy PRG_2CC.DAT to a safe place.
-
- 5. Remove PRG_2CC.TOS and PRG_2CC.DAT from the
- partition or floppy.
-
- 6. Repeat steps 2 through 5 for PRG_2CP.TOS and
- PRG_2CR.TOS.
-
- Program 23. Source file for PRG_2CC.TOS
-
- ; Program Name: PRG_2CC.S
- ; Version: 1.002
-
- ; Assembly Instructions:
-
- ; Assemble in Relocatable mode and save with a TOS extension.
-
- ; Execution Instructions:
-
- ; Execute program SPEEDTST.TTP and type PRG_2CC.TOS on the input
- ; parameter line. SPEEDTST.TTP will produce a data file named PRG_2CC.DAT
- ; on disk. You will be able to compare the data for this program to that
- ; produced for programs PRG_2CP.TOS and PRG_2CR.TOS.
-
- ; Program Function:
-
- ; Statements within a nested loop structure are executed 50,000 times
- ; so that the load and execution time of this program can be compared with
- ; similar programs assembled in the PC-relative and Relocatable modes.
-
- store_after_load_time:
- trap #3 ; Returns value of system clock in D0.
- lea after_load_time(pc), a0
- move.w d0, (a0)
-
- move.w #9, d1 ; Initialize outer loop counter.
- outer_loop: ; Loop ten times.
- move.w #49999, d0 ; Initialize inner loop counter.
- inner_loop: ; Loop 50,000 times.
- move.l #label, a0 ; Can't use (pc) here.
- lea label(pc), a0
- move.l label(pc), a0
- move.l #label, -(sp) ; Can't use (pc) here.
- pea label(pc)
- move.l label(pc), -(sp)
- lea $C(sp), sp ; Reposition stack pointer to top of stack.
- dbra d0, inner_loop ; Loop back until D0 = -1.
- dbra d1, outer_loop ; Loop back until D1 = -1.
-
- terminate:
- move.w after_load_time(pc), -(sp) ; Pass after load time to SPEEDTST.TTP.
- move.w #$4C, -(sp) ; Function = p_term = GEMDOS $4C.
- trap #1
-
- data
-
- ; NOTE: Below, the variable "label" is supposed to be a pointer to the
- ; variable "after_load_time". If this program is assembled in
- ; Relocatable mode, the "run time" address of "after_load_time" will be
- ; stored in the 4 bytes declared at "label" when the program is loaded
- ; from disk to ram.
-
- ; But, if the program is assembled in PC-relative mode, the "run time"
- ; address will not be stored there; instead, the "assembly time" address
- ; will be stored in the 4 bytes. That is undesirable.
-
- label: dc.l after_load_time ; This works for COMBO assembly.
- bss
- after_load_time: ds.w 1
- end
-
-
- SPEEDTST.TTP Execution Results
- for PRG_2CC.TOS, loaded from drive: G
-
- Load time: 1995 milliseconds
- Execution time: 7485 milliseconds
-
-
- Program 24. Source file for PRG_2CP.TOS and
- PRG_2CR.TOS.
-
- ; Program Name: PRG_2CP.S
- ; Version: 1.002
-
- ; Assembly Instructions:
-
- ; Assemble in PC-relative mode and save with a TOS extension, then
- ; change the name to PRG_2CR, assemble in Relocatable mode and save as
- ; PRG_2CR.TOS. This will allow you to prepare two object code files from
- ; the same source file.
-
- ; Execution Instructions:
-
- ; Execute program SPEEDTST.TTP and type PRG_2CP.TOS on the input
- ; parameter line. SPEEDTST.TTP will produce a data file named PRG_2CP.DAT
- ; on disk. You will be able to compare the data for this program to that
- ; produced for programs PRG_2CC.TOS and PRG_2CR.TOS.
-
- ; Do the same with PRG_2CR.TOS to produce a data file named PRG_2CR.DAT.
- ; You can view the DAT files with an editor, from the desktop using the Show
- ; function or by printing them.
-
- ; Program Function:
-
- ; Statements within a nested loop structure are executed 50,000 times
- ; so that the load and execution time of this program can be compared with
- ; similar programs assembled in the Relocatable and Combo modes.
-
- store_after_load_time:
- trap #3 ; Returns value of system clock in D0.
- lea after_load_time, a0
- move.w d0, (a0) ; Store time in variable "after_load_time".
- move.w #9, d1 ; Initialize outer loop counter.
- outer_loop: ; Loop ten times.
- move.w #49999, d0 ; Initialize inner loop counter.
- inner_loop: ; Loop 50,000 times.
- move.l #label, a0
- lea label, a0
- move.l label, a0
- move.l #label, -(sp)
- pea label
- move.l label, -(sp)
- lea $C(sp), sp ; Reposition stack pointer to top of stack.
- dbra d0, inner_loop ; Loop back until D0 = -1.
- dbra d1, outer_loop ; Loop back until D1 = -1.
-
- terminate:
- move.w after_load_time, -(sp) ; Pass after load time to SPEEDTST.TTP.
- move.w #$4C, -(sp) ; Function = p_term = GEMDOS $4C.
- trap #1
-
- data
-
- ; NOTE: Below, the variable "label" is supposed to be a pointer to the
- ; variable "after_load_time". If this program is assembled in
- ; Relocatable mode, the "run time" address of "after_load_time" will be
- ; stored in the 4 bytes declared at "label" when the program is loaded
- ; from disk to ram.
-
- ; But, if the program is assembled in PC-relative mode, the "run time"
- ; address will not be stored there; instead, the "assembly time" address
- ; will be stored in the 4 bytes. That is undesirable.
-
- label: dc.l after_load_time ; This does not give desired response for
- bss ; PC-relative assembly, it works for
- after_load_time: ds.w 1 ; Relocatable assembly.
- end
-
-
- SPEEDTST.TTP Execution Results
- for PRG_2CP.TOS, loaded from drive: G
-
- Load time: 35 milliseconds
- Execution time: 7475 milliseconds
-
-
- SPEEDTST.TTP Execution Results
- for PRG_2CR.TOS, loaded from drive: G
-
- Load time: 1995 milliseconds
- Execution time: 8510 milliseconds
-
-
- A comparison of load times is dramatic. When the
- program is assembled in AssemPro's PC-relative mode, the
- time required to load is 1960 msec (about 2 seconds) less
- than it is when the program is assembled in Relocatable
- mode. And, although the use of pc-relative addressing in a
- program that is assembled in Relocatable mode does not
- reduce load time, no less dramatic is its effect on
- execution time. Because for even so short a program as
- this, it reduces the execution time from 8510 msec to 7485
- msec, effecting a 1 second savings.
-
- Why PC-relative Programs Load Faster
-
- As I mentioned in chapter 2, part of the extra load
- time (or that which I have chosen to designate as load time,
- but which could as easily be called start-up time) required
- by programs assembled in the Relocatable mode is due to the
- task of program relocation itself. I have since received
- information from a reader which further clarifies the
- comparison between the two assembly modes. This reader is
- well known for his hard disk backup utility, TURTLE.PRG. I
- speak, of course, of George Woodside.
- I now paraphrase the information that Mr. Woodside was
- kind enough to share with me. It seems that when a
- Relocatable program is executed the operating system clears
- all unoccupied ram before the onset of execution. In
- addition, Mr. Woodside went on to explain that PC-relative
- LSR programs which are executed from the AUTO folder can't
- be deleted because the operating system does not close those
- files while they are still ram resident. I have mentioned
- this phenomenon elsewhere, but I have not known the reason
- until now.
- I had planned to write a few programs that would
- graphically display the information presented by Mr.
- Woodside, but something has occurred which probably renders
- such explorations irrelevant from my position. Since
- receiving his letter I have been forced to purchase a MEGA 2
- because I needed the additional memory. And, while the
- differences in loading and execution times are still
- noticeable, they are not as dramatic as they were with the
- 1040ST. Therefore, I simply rest the matter by saying that
- there is a difference and that difference is reported as I
- experienced it.
-
- The Effects of Floppy Disk Formatting Techniques
-
- The first thing I'm going to do in this section is to
- refer you to a couple of START magazine articles written by
- Dave Small and Dan Moore: Hard Disk Warfare, Spring, 1987
- and Let's Twist Again, Summer, 1988. Those articles will
- help you to understand why this section is included in the
- book. Briefly, there is more than one way to format floppy
- disks for the ST. I have designed two experiments which
- permit you to compare the effects of two non-standard
- methods to that of the standard ST method.
- I refer to the formatted disk you obtain when you
- format from the desktop by selecting the Format option under
- the File menu as standard ST. I refer to the formatted disk
- you obtain by any other method as non-standard. The non-
- standard formatting methods used in the experiment are Mega
- Twister, included on START magazine's Summer, 1988 disk and
- DC Formatter Version 2.2, included on ST Informer's PDM 188
- disk.
- The most extensive reference to DC Formatter that I
- could find occurs in the July/August 1988 issue of RESET
- magazine, on page 11, within the Public Beat column. If
- this article is not readily available to you, I suggest that
- you be not concerned about it. While the author's
- enthusiasm for DC Formatter is undoubtedly genuine, and,
- although I have found DC Formatter version 2.2 to format
- faster than Mega Twister (1 minute 7 seconds versus 1 minute
- 45 seconds), and, although the data in figures 6.1 and 6.2
- seem to confirm that DC Formatter may indeed write to
- floppies slightly faster than Mega Twister, the data does
- not confirm the author's apparent conclusion that DC
- Formatter is tremendously faster than Mega Twister.
- A new version of DC Formatter, Version 3.0, is
- discussed in the December, 1988 issue of ST Informer.
- Before deciding to use this method of formatting disks, I
- suggest that you read the information on page 33 of that
- issue, under the heading DCFRMT3.ARC. There you will find a
- statement concerning the incompatibility of the earlier
- version with GDOS. Furthermore, I suggest that you use any
- non-standard method of formatting disks with all of the
- caution the word non-standard connotes.
- Program 25 has been designed to permit a comparison of
- the load, write and read times associated with each disk
- formatting method. The steps used to isolate the program on
- the appropriate disks are discussed within the program's
- documentation. Program 25 invokes trap calls to custom trap
- #10, which is installed by executing TRAP_10.PRG. The
- source listing for TRAP_10.PRG follows the listing for
- program 25.
-
- Program 25. This program is used to compare the effects
- of three floppy disk formatting techniques.
-
- ; Program Name: PRG_2DP.S
- ; Version 1.003
-
- ; Assembly Instructions:
-
- ; Assemble in PC-relative mode and save with a TOS extension.
-
- ; Execution Note:
-
- ; This program invokes custom traps which must be installed by
- ; TRAPS.PRG and TRAP_10.PRG prior to its execution. Execute this program
- ; by typing its name on SPEEDTST.TTP's command line.
-
- ; But before executing this program, prepare three floppy disks. The
- ; first should be formatted from the desktop, using the ST formatting
- ; algorithm. The second should be formatted with a version of DCFORMAT,
- ; available from ST Informer, 909 NW Starlite Place, Grants Pass, OR 97526,
- ; (503)476-0071 on disk PDM 188 or PDM 1288. The price of each disk from
- ; ST Informer is $6.00 for non-subscribers. You can subscribe for $18.00
- ; per year and get a free disk coupon. I highly recommend a subscription.
-
- ; The third disk should be formatted with Dave Small and Dan Moore's
- ; Twister program available on START magazine's Summer, 1988 issue. You
- ; can find an order form in any issue of START or phone (800)234-7001.
-
- ; Copy PRG_2DP.TOS to each blank disk. Copy SPEEDTST.TTP to each disk.
- ; Execute SPEEDTST.TTP on each disk in turn, typing PRG_2DP.TOS on its
- ; command line. Compare the contents of the WRITE_1 and WRITE_2 files of
- ; each disk to verify that they are identical. Compare the PRG_2DP.DAT file
- ; on each disk to the others.
-
- ; Program Function:
-
- ; This program writes data to WRITE_1.DAT, reads WRITE_1.DAT, then
- ; writes what it has read to WRITE_2.DAT to confirm that it has correctly
- ; written and read the data declared within the data section of the program.
-
- ; SPEEDTST.TTP will report the load and total execution time for this
- ; program.
-
- ; Within the program, the time to write the data to WRITE_1.DAT will be
- ; calculated and reported; and the time to read the data from WRITE_1.DAT
- ; will also be calculated and reported; finally, the time to write the data
- ; to WRITE_2.DAT will be calculated and reported.
-
- ; This program is to be used to compare the write to and read from times
- ; involving three floppy disks, each of which has been formatted with a
- ; different formatting algorithm.
-
- fetch_load_time:
- trap #3 ; Returns value of system clock in D0.
- release_excess_memory: ; Also stores after-load time in TRAPS bss.
- lea program_end, a0 ; Put "end of program" address in A0.
- movea.l 4(a7), a1 ; Put "basepage" address in A1.
- trap #6 ; Calculate program size and release memory.
- lea stack, a7
-
- print_heading:
- lea heading, a0
- bsr print_string
-
- fetch_write_start_time:
- trap #3
- lea write_start_time, a0
- move.w d0, (a0)
- create_file_1:
- move.w #0, -(sp) ; File attribute = read/write.
- pea file_1_name ; For WRITE_1.DAT.
- move.w #$3C, -(sp) ; Function = f_create = GEMDOS $3C.
- trap #1 ; File handle is returned in D0.
- addq.l #8, sp
- lea file_1_handle, a0 ; Store returned file handle.
- move.w d0, (a0)
-
- write_to_file_1:
- pea string ; Push address of buffer.
- move.l #451, -(sp) ; Number of bytes to write.
- move.w d0, -(sp) ; File handle to be written to.
- move.w #$40, -(sp) ; GEMDOS function = write.
- trap #1
- lea $C(sp), sp ; Reposition stack pointer to top of stack.
- close_file_1:
- move.w file_1_handle, -(sp)
- move.w #$3E, -(sp) ; Function = GEMDOS $3E = f_close.
- trap #1
- addq.l #4, sp
- get_end_time:
- trap #3
- sub.w write_start_time, d0 ; Subtract start time from end time.
- ext.l d0 ; Extend to 32 bits
- trap #10 ; Convert to milliseconds and print.
-
- set_dta:
- pea dta ; dta = address of 44 byte buffer.
- move.w #$1A, -(sp) ; GEMDOS function = set dta.
- trap #1
- addq.l #6, sp
- print_read_time_label:
- lea read_msg, a0
- bsr print_string
- fetch_read_start_time:
- trap #3
- lea read_start_time, a0
- move.w d0, (a0)
-
- ; NOTE: Reading is so fast, must loop to accumulate enough time to measure.
-
- move.w #99, d3 ; Set up counter for 100 loops.
- search_for_file:
- move.w #0, -(sp) ; Attribute = normal access.
- pea file_1_name ; Name of file to search for.
- move.w #$4E, -(sp) ; GEMDOS function = search first.
- trap #1
- addq.l #8, sp
- tst d0
- bne.s not_found
-
- read_WRITE_1_DAT:
- lea dta, a0
- pea buffer
- move.l $1A(a0), -(sp) ; Number of bytes to read.
- move.w file_1_handle, -(sp) ; File to read.
- move.w #$3F, -(sp) ; GEMDOS function = read.
- trap #1
- lea $C(sp), sp ; Reposition stack pointer.
- dbra d3, search_for_file
- _get_end_time:
- trap #3
- sub.w read_start_time, d0 ; Subtract start time from end time.
- ext.l d0 ; Extend to 32 bits
- trap #10 ; Convert to milliseconds and print.
-
- print_write_time_label:
- lea write_msg, a0
- bsr print_string
- _fetch_write_start_time:
- trap #3
- lea write_start_time, a0
- move.w d0, (a0)
- create_file_2:
- move.w #0, -(sp) ; File attribute = read/write.
- pea file_2_name ; For WRITE_2.DAT.
- move.w #$3C, -(sp) ; Function = f_create = GEMDOS $3C.
- trap #1 ; File handle is returned in D0.
- addq.l #8, sp
- lea file_2_handle, a0 ; Store returned file handle.
- move.w d0, (a0)
- write_to_file_2:
- lea dta, a0
- pea string ; Push address of buffer.
- move.l $1A(a0), -(sp) ; Number of bytes to write.
- move.w d0, -(sp) ; File handle to be written to.
- move.w #$40, -(sp) ; GEMDOS function = write.
- trap #1
- lea $C(sp), sp ; Reposition stack pointer to top of stack.
- close_file_2:
- move.w file_2_handle, -(sp)
- move.w #$3E, -(sp) ; Function = GEMDOS $3E = f_close.
- trap #1
- addq.l #4, sp
- _get__end_time:
- trap #3
- sub.w write_start_time, d0 ; Subtract start time from end time.
- ext.l d0 ; Extend to 32 bits
- trap #10 ; Convert to milliseconds and print.
-
- not_found:
- trap #8 ; Terminate.
-
- print_string:
- pea (a0)
- move.w #9, -(sp)
- trap #1
- addq.l #6, sp
- rts
-
- data
- file_1_name: dc.b 'WRITE_1.DAT',0
- file_2_name: dc.b 'WRITE_2.DAT',0
- heading: dc.b 'PRG_2DP.TOS Execution Results',$D,$A,$D,$A
- dc.b ' Time to create, write and close WRITE_1.DAT: ',0
- read_msg dc.b ' Time to read WRITE_1.DAT into buffer 10 times: ',0
- write_msg: dc.b ' Time to create, write and close WRITE_2.DAT: ',0
- string: dc.b ' This paragraph will be written to a disk file named '
- dc.b 'WRITE_1.DAT. The time ',$D,$A
- dc.b ' required to write the paragraph will be reported in '
- dc.b 'file PRG_2DP.DAT.',$D,$A
- dc.b ' Then the contents of WRITE_1.DAT will be read into a '
- dc.b 'buffer. The time ',$D,$A
- dc.b ' required to read the contents of the file will be '
- dc.b 'reported in file ',$D,$A
- dc.b ' PRG_2DP.DAT. Finally, the contents of the buffer '
- dc.b 'will be written to ',$D,$A
- dc.b ' WRITE_2.DAT so that what has been read can be compared '
- dc.b 'to what was written.',$D,$A,$1A
-
- ; NOTE: The ASCII code for ^Z (control Z) is normally used to mark the end
- ; of a file so that a program reading the file may look for that mark.
-
- bss
- align
- file_1_handle: ds.w 1
- file_2_handle: ds.w 1
- write_start_time: ds.w 1
- read_start_time: ds.w 1
- dta: ds.b 44
- buffer: ds.b 452
- ds.l 96
- stack: ds.l 0
- program_end: ds.l 0
- end
-
-
- Program 26. The custom trap #10 algorithm.
-
- ; Program Name: TRAP_10.S
- ; Version 1.001
-
- ; Assembly Instructions:
-
- ; Assemble in PC-relative mode and save with a PRG extension.
-
- ; Program Function:
-
- ; This is a LSR program that establishes a user defined trap. It may be
- ; executed from the desktop, but you may prefer to copy it to the AUTO
- ; folder of your boot partition or floppy disk so that it will execute
- ; automatically during boot.
-
- ; Trap #10 is used by programs which time an interval. This trap
- ; converts the interval passed in register D0 to milliseconds, then it
- ; prints the ASCII decimal value of that interval in milliseconds.
-
- ; This program invokes a custom trap that is established by TRAPS.PRG,
- ; therefore, that program must be executed before trap #10 is invoked by a
- ; program.
-
- program_start: ; Calculate program size and retain result.
- lea program_end, a3 ; Fetch program end address.
- suba.l 4(a7), a3 ; Subtract basepage address.
-
- enter_supervisor_mode:
- move.l #0, -(sp) ; The zero turns on supervisor mode.
- move.w #$20, -(sp) ; Function = super = GEMDOS $20.
- trap #1 ; Go to supervisor mode.
- addq.l #6, sp ; Supervisor stack pointer (SSP) returned in D0.
- movea.l d0, a5 ; Save SSP in scratch register.
-
- install_trap_10_routine: ; Note: pointer = vector = pointer.
- lea $A8, a0 ; Fetch trap #10 pointer address.
- lea trap_10_routine, a1 ; Fetch address of trap #10 routine.
- move.l a1, (a0) ; Store trap address at pointer address.
-
- enter_user_mode:
- pea (a5) ; Restore supervisor stack pointer.
- move.w #$20, -(sp) ; Function = super = GEMDOS $20.
- trap #1 ; Go to user mode.
- addq.l #6, sp ; Reset stack pointer to top of stack.
-
- relinquish_processor_control: ; Maintain memory residency.
- move.w #0, -(sp) ; See page 121 of Internals book.
- move.l a3, -(sp) ; Program size.
- move.w #$31, -(sp) ; Function = ptermres = GEMDOS $31.
- trap #1
-
- trap_10_routine:
-
- ; Expects a binary value in D0 which represents a measured interval. This
- ; algorithm converts the value in D0 to milliseconds (msec) then prints the
- ; value in decimal msec.
-
- preserve_value_in_d3:
- move.l d3, -(sp)
- convert_time_to_msec:
- move.l d0, d3 ; Save copy in D0 to add.
- asl.l #2, d3 ; Shift to multiply by 4.
- add.l d0, d3 ; To complete multiplication by 5.
-
- print_time:
- cmpi.l #999, d3 ; If time is less than 1000, then
- bgt no_space ; print a leading blank space for output
- lea space, a0 ; alignment.
- bsr print_string
- cmpi.l #99, d3 ; If time is less than 100, then
- bgt no_space ; print another leading blank space.
- lea space, a0
- bsr print_string
- no_space:
- move.l d3, d1 ; Move to D1 so can convert to ASCII decimal
- trap #4 ; Returns address of decimal string in A0.
- bsr.s print_string
- lea units_label, a0
- bsr.s print_string
- move.l (sp)+, d3 ; Restore contents of D3.
- rte
-
- ;
- ; Subroutine
- ;
-
- print_string: ; Expects address of string to be in A0.
- pea (a0) ; Push address of string onto stack.
- move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
- trap #1 ; GEMDOS call
- addq.l #6, sp ; Reset stack pointer to top of stack.
- rts
-
- data
- space: dc.b " ",0
- units_label: dc.b " milliseconds", $D,$A,0
- bss
- align ; Align storage on a word boundary.
- program_end: ds.l 0
- end
-
-
- Figure 6.1 shows the execution results of program 25
- with the contents of memory location $444 unaltered (See
- page 252 of the Internals book and Dave's Write-With-Verify
- Lecture, page 88 of START's Spring, 1987 issue.). As you
- will see, it makes sense to perform the experiment twice,
- once with the ST standard value in this memory location and
- once with the value zero stored there. And since that
- subject has come up, this is a convenient spot at which to
- introduce a program that I use to configure certain ST
- system variables during boot. Refer to section 3.7 The ST
- System Variables, pages 250-257 of the Internals book.
-
- Program 27. A program used to configure system variables
- during boot. I do not guarantee that this program will work
- with all ST systems.
-
- ; Program Name: CONFIG.S
-
- ; Assembly Instructions:
-
- ; Assemble in PC-relative mode and save with a PRG extension. Move
- ; CONFIG.PRG to the AUTO folder of the boot disk.
-
- ; Program Purpose:
-
- ; Configures system variables.
-
- mainline:
- lea stack, a7 ; Point A7 to this program's stack.
-
- enter_supervisor_mode:
- move.l #0, -(sp) ; The zero turns on supervisor mode.
- move.w #$20, -(sp) ; Function = super = GEMDOS $20.
- trap #1 ; Supervisor stack pointer returned in D0.
- addq.l #6, sp
-
- ; Algorithm 1:
-
- ; Turns off keyclick sound. Refer to page 254 of the Internals book. The
- ; system variable at address $484 is a byte length variable. The bits of
- ; this variable have the meanings as indicated in the Internals book. The
- ; bit of interest is #0. When this bit is a one, the computer emits a
- ; click each time a key is pressed. When the bit is a zero, these clicks
- ; are not emitted. A zero is placed in this bit by replacing the content
- ; of the byte at $484 (which is 7 before the replacement, if key click is
- ; enabled) with $6.
-
- disable_key_click:
- move.b #6, $484 ; Refer to page 254 of the Internals book.
-
- ; Algorithm 2:
-
- ; Performs the printer installation that is accomplished by CONTROL.ACC.
- ; The printer configuration table consists of one word, stored at $E4A.
- ; The bits of this word have the following meanings:
-
- ; BIT MEANING IF ZERO MEANING IF ONE
- ; --- --------------- --------------
- ; 0 Dot matrix Daisy printer
- ; 1 Black/white Color printer
- ; 2 1280 dots/line 960 dots/line
- ; 3 Draft mode Final mode
- ; 4 Printer port Modem port
- ; 5 Continuous feed Single sheet
-
- ; Bits 6 through 15 are not used.
-
- install_printer:
- move.w #$4, $E4A
-
- ; Algorithm 3:
-
- ; Turns off disk write verify.
-
- disable_write_verify:
- move.w #0, $444 ; Refer to page 252 of the Internals book.
-
- return_to_user_mode:
- move.l d0, -(sp) ; Restore "before call" SSP.
- move.w #$20, -(sp) ; Function = super = GEMDOS $20.
- trap #1
- addq.l #6, sp
-
- terminate:
- move.w #0, -(sp) ; Function = p_term_old = GEMDOS $0.
- trap #1 ; GEMDOS call.
-
- ds.l 24 ; Stack.
- stack: ds.l 1 ; Address of stack.
- program_end: ds.l 0
- end
-
-
- Figure 6.1. Program 25's execution results with write-verify
- active.
-
- PRG_2DP.TOS Execution Results: Hard disk drive.
-
- Time to create, write and close WRITE_1.DAT: 160 milliseconds
- Time to read WRITE_1.DAT into buffer: 520 milliseconds
- Time to create, write and close WRITE_2.DAT: 115 milliseconds
-
- SPEEDTST.TTP Execution Results
- for PRG_2DP.TOS, loaded from drive: G
-
- Load time: 55 milliseconds
- Execution time: 1025 milliseconds
-
- PRG_2DP.TOS Execution Results: ST Standard Formatted.
-
- Time to create, write and close WRITE_1.DAT: 3185 milliseconds
- Time to read WRITE_1.DAT into buffer 10 times: 540 milliseconds
- Time to create, write and close WRITE_2.DAT: 2650 milliseconds
-
- SPEEDTST.TTP Execution Results
- for PRG_2DP.TOS, loaded from drive: A
-
- Load time: 385 milliseconds
- Execution time: 7840 milliseconds
-
- PRG_2DP.TOS Execution Results: Twister Formatted, Version 2.0.
-
- Time to create, write and close WRITE_1.DAT: 3195 milliseconds
- Time to read WRITE_1.DAT into buffer 10 times: 540 milliseconds
- Time to create, write and close WRITE_2.DAT: 2260 milliseconds
-
- SPEEDTST.TTP Execution Results
- for PRG_2DP.TOS, loaded from drive: A
-
- Load time: 365 milliseconds
- Execution time: 7300 milliseconds
-
- PRG_2DP.TOS Execution Results: DC Formatter, Version 2.2.
-
- Time to create, write and close WRITE_1.DAT: 3185 milliseconds
- Time to read WRITE_1.DAT into buffer 10 times: 540 milliseconds
- Time to create, write and close WRITE_2.DAT: 2255 milliseconds
-
- SPEEDTST.TTP Execution Results
- for PRG_2DP.TOS, loaded from drive: A
-
- Load time: 365 milliseconds
- Execution time: 7275 milliseconds
-
-
- Figure 6.2 shows the execution results of program 25
- with the contents of memory location $444 equal to zero,
- thereby disabling the write-verify function. Notice that
- the state of the write-verify function does not seem to
- effect disk access times when the program is executed from a
- hard disk partition. The hard disk times of figures 6.1 and
- 6.2 are identical, considering the 5 msec resolution of the
- system clock.
-
- Figure 6.2. Program 25's execution results with write-
- verify disabled.
-
- PRG_2DP.TOS Execution Results: Hard disk drive.
-
- Time to create, write and close WRITE_1.DAT: 160 milliseconds
- Time to read WRITE_1.DAT into buffer 10 times: 520 milliseconds
- Time to create, write and close WRITE_2.DAT: 110 milliseconds
-
- SPEEDTST.TTP Execution Results
- for PRG_2DP.TOS, loaded from drive: G
-
- Load time: 60 milliseconds
- Execution time: 1020 milliseconds
-
- PRG_2DP.TOS Execution Results: ST Standard Formatted.
-
- Time to create, write and close WRITE_1.DAT: 1590 milliseconds
- Time to read WRITE_1.DAT into buffer 10 times: 540 milliseconds
- Time to create, write and close WRITE_2.DAT: 1450 milliseconds
-
- SPEEDTST.TTP Execution Results
- for PRG_2DP.TOS, loaded from drive: A
-
- Load time: 390 milliseconds
- Execution time: 4640 milliseconds
-
- PRG_2DP.TOS Execution Results: Twister Formatted, Version 2.0.
-
- Time to create, write and close WRITE_1.DAT: 1595 milliseconds
- Time to read WRITE_1.DAT into buffer 10 times: 540 milliseconds
- Time to create, write and close WRITE_2.DAT: 1060 milliseconds
-
- SPEEDTST.TTP Execution Results
- for PRG_2DP.TOS, loaded from drive: B
-
- Load time: 365 milliseconds
- Execution time: 4100 milliseconds
-
- PRG_2DP.TOS Execution Results: DC Formatter, Version 2.2.
-
- Time to create, write and close WRITE_1.DAT: 1590 milliseconds
- Time to read WRITE_1.DAT into buffer 10 times: 540 milliseconds
- Time to create, write and close WRITE_2.DAT: 1050 milliseconds
-
- SPEEDTST.TTP Execution Results
- for PRG_2DP.TOS, loaded from drive: A
-
- Load time: 370 milliseconds
- Execution time: 4080 milliseconds
-
-
- Unfinished Business: Pushing Zero Onto the Stack
-
- In chapter 2, I discussed the execution speed and
- memory requirements of three ways to push a zero onto the
- stack. At that time, I used table 2.1 to list data that was
- observed and prepared manually. I am now going to introduce
- a program that prints comparative data for the three methods
- previously discussed. Because the time required to execute
- individual instructions is too short for accurate
- measurement, we rely on multiple executions of each
- instruction to lengthen the period of time that we are
- measuring. In addition, we are not as interested in
- absolute execution speed as we are in relative speed. We
- just want to know which instructions are faster, not how
- long it takes to execute one of them.
- Program 28 will generate values that we can compare
- objectively. In addition, the program will calculate the
- memory requirements for each of the three instructions in
- which we are interested. As a finale, the advantage of
- using the third method, when it is appropriate to do so, is
- illustrated by removal of the statement which needs be
- executed only once from the execution loop. That is the
- statement which prestores a zero in register D0.
-
- Program 28. Three ways to push zero onto the stack.
-
- ; Program Name: PUSHZERO.S
- ; Version: 1.003
-
- ; Assembly Instructions:
-
- ; Assemble in AssemPro PC-relative mode and save with a TOS extension.
-
- ; Execution Instructions:
-
- ; Execute from the desktop; or execute SPAWN.TTP, type PUSHZERO.TOS on
- ; its command line and view this program's output in PUSHZERO.DAT.
-
- ; Program Function:
-
- ; For each of the three methods of pushing $0 onto the stack discussed in
- ; chapter two, calculates the memory occupied by each instruction, and
- ; calculates the execution time in milliseconds required for 50,000
- ; executions of each.
-
- ; Then, in order to emphasize the comparison for a more practical type of
- ; application, the one statement in the third algorithm that needs be
- ; executed only once, before the execution of the loop, is placed outside
- ; of the loop, and the time for 50,000 executions of the third method is
- ; repeated.
-
- calculate_program_size:
- lea -$102(pc), a1 ; Fetch basepage start address.
- lea program_end, a0 ; Fetch program end address.
- adda.l #100512, a0 ; Attach large stack space to end of program.
-
- ; NOTE: The above method of declaring stack space at the end of this program is
- ; preferable to that which I have been using because, if a label were
- ; used to declare the stack in the bss section, then the label used
- ; to mark the end of the program would be too far to permit the program
- ; to be assembled in AssemPro's PC-relative mode.
-
- movea.l a0, a7 ; Point A7 to this program's stack.
- trap #6 ; Return unused memory to op system.
-
- print_heading:
- lea heading, a0
- bsr print_string
-
- push_method_1:
- lea header_1, a0
- bsr print_string
- move.l #49999, d7 ; D7 is counter for the push loop.
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- push_1_loop: ; Marks start of instruction in the loop.
- clr.w -(sp) ; Instruction in the loop.
- memory_1: ; Marks end of instruction in the loop.
- dbra d7, push_1_loop ; Loop 50000 times.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
- adda.l #100000, sp ; Reset stack pointer to top of stack.
-
- print_method_1_requisite_memory:
- lea header_5, a0 ; Print requisite memory header.
- bsr print_string
- lea memory_1, a0 ; Calculate number of bytes occupied by the
- lea push_1_loop, a1 ; instruction in the loop.
- suba.l a1, a0
- bsr print_requisite_memory
-
- push_method_2:
- lea header_2, a0
- bsr print_string
- move.l #49999, d7 ; D7 is counter for the push loop.
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- push_2_loop: ; Marks start of instruction in the loop.
- move.w #0, -(sp) ; Instruction in the loop.
- memory_2: ; Marks end of instruction in the loop.
- dbra d7, push_2_loop ; Loop 50000 times.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
- adda.l #100000, sp ; Reset stack pointer to top of stack.
-
- print_method_2_requisite_memory:
- lea header_6, a0 ; Print requisite memory header.
- bsr.s print_string
- lea memory_2, a0 ; Calculate number of bytes occupied by the
- lea push_2_loop, a1 ; instruction in the loop.
- suba.l a1, a0
- bsr.s print_requisite_memory
-
- push_method_3:
- lea header_3, a0
- bsr.s print_string
- move.l #49999, d7 ; D7 is counter for the push loop.
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- push_3_loop: ; Marks start of instructions in the loop.
- moveq #0, d0 ; There are two instructions in the loop.
- move.w d0, -(sp) ;
- memory_3: ; Marks end of instructions in the loop.
- dbra d7, push_3_loop ; Loop 50000 times.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
- adda.l #100000, sp ; Reset stack pointer to top of stack.
-
- print_method_3_requisite_memory:
- lea header_7, a0 ; Print requisite memory header.
- bsr.s print_string
- lea memory_3, a0 ; Calculate number of bytes occupied by the
- lea push_3_loop, a1 ; instructions in the loop.
- suba.l a1, a0
- bsr.s print_requisite_memory
-
- modified_push_method_3:
- lea header_4, a0
- bsr.s print_string
- move.l #49999, d7 ; D7 is counter for the push loop.
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- moveq #0, d0 ; Prestore $0 in D0.
- push_4_loop:
- move.w d0, -(sp) ; Contents of D0 onto the stack.
- dbra d7, push_4_loop ; Loop 50000 times.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
- adda.l #100000, sp ; Reset stack pointer to top of stack.
-
- terminate:
- trap #8
-
- ;
- ; SUBROUTINES
- ;
-
- print_requisite_memory:
- move.l a0, d1
- trap #4
- bsr.s print_string
- lea header_8, a0
- bsr.s print_string
- rts
-
- print_string: ; Expects address of string to be in A0.
- move.l a0, -(sp) ; Push address of string onto stack.
- move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
- trap #1 ; GEMDOS call
- addq.l #6, sp ; Reset stack pointer to top of stack.
- rts
-
- data
- heading: dc.b $D,$A,"PUSHZERO Execution Results",$D,$A,$D,$A,0
- header_1: dc.b " Elapsed time for clr.w -(sp): ",0
- header_2: dc.b " Elapsed time for move.w #0, -(sp): ",0
- header_3: dc.b " Elapsed time for moveq #0, d0",$D,$A
- dc.b " move.w d0, -(sp): ",0
- header_4: dc.b " Time for only move.w d0, -(sp): ",0
- header_5: dc.b " Requisite memory: ",0
- header_6: dc.b " Requisite memory: ",0
- header_7: dc.b " Requisite memory: ",0
- header_8: dc.b " bytes",$D,$A,$D,$A,0
- bss
- align
- program_end: ds.l 0
- end
-
- PUSHZERO Execution Results
-
- Elapsed time for clr.w -(sp): 180 milliseconds
- Requisite memory: 2 bytes
-
- Elapsed time for move.w #0, -(sp): 155 milliseconds
- Requisite memory: 4 bytes
-
- Elapsed time for moveq #0, d0
- move.w d0, -(sp): 155 milliseconds
- Requisite memory: 4 bytes
-
- Time for only move.w d0, -(sp): 125 milliseconds
-
-
- Accuracy of the Results
-
- It is true that we are interested in relative results,
- as far as instruction execution time is concerned, but it is
- worth remembering that when we view the results of a
- PUSHZERO execution, the elapsed times are calculated to a
- resolution of approximately 10 msec (milliseconds); 5 msec
- for the first call to get_time and 5 for the second call.
- How does this affect the accuracy of the program's
- report? Well, there are four cases to consider, each
- composed of two incidents, with each incident depending on
- whether we receive the time just before or just after the
- variable _hz_200 has been incremented.
-
- 1. Both the first and second times are received just
- after increment. In this case true time equals
- time, and loss of accuracy is due to overhead in
- making the calls and the inherent accuracy of the
- system clock.
-
- 2. The first time is received immediately after
- increment, the second is received immediately
- before. Then, true time equals time plus 5 msec
- plus overhead.
-
- 3. The first time is received immediately before
- increment, the second is received immediately after.
- Then, true time equals time minus 5 msec plus
- overhead.
-
- 4. Both the first time and second times are received
- immediately before increment. In this case true
- time equals time plus overhead.
-
- If you execute PUSHZERO repeatedly, you should notice
- some differences in reported times. For each of the timed
- events in the program, I have never seen a variance greater
- than 5 msec, however, the variance can be less than or
- greater than the most often reported value. We should
- expect this to happen because the resolution of the elapsed
- time calculations is 10 msec.
- To compare the reported times with the instruction
- execution times given in the Motorola's reference book, we
- need only consider the instructions that form the loop. For
- push_method_1, the instructions are:
-
- clr.w -(sp)
-
- dbra d7, push_1_loop
-
- The clr.w instruction requires 14 clock periods, the dbra
- instruction requires 10. Total clock periods for the loop
- is 24 X 50,000 = 1.2X106. Total time required to execute
- the loop is 1.2X106 X 1.25X10-7(time for one clock period) =
- 150 msec. The program reports the time most often as 180
- msec. The difference is due to inaccuracies in the system
- clock, the imprecise resolution and the time needed to
- execute non-related instructions.
-
- Increasing Loop Execution Speed
-
- My primary use of program 28 was to introduce a method
- of determining the relative execution speed and memory
- requirements of two or more algorithms so that these
- attributes could be compared. As an adjunct to the three
- primary algorithms, I took the opportunity to introduce the
- modified_push_method_3 algorithm. With this algorithm, I
- intended to show you an example for which the third method
- of pushing a zero onto the stack was definitely superior
- because it clearly decreased loop execution time by
- decreasing the number of instructions within the loop.
- The next program, CLR_MEM, extends the idea of
- increasing loop execution speed, not by decreasing the
- instructions within the loop, but by increasing them. We
- begin the example by considering the task of clearing 32000
- bytes of memory, the size of a video screen, to zero.
- Having been conditioned to consider quantities of memory in
- terms of bytes, the thought of clearing it a byte at a time
- should occur naturally. Program 29 also illustrates the
- advantages to be gained by thinking of memory in terms of
- words and longwords.
-
- Program 29. A program that clears 32,000 bytes of ram.
-
- ; Program Name: CLR_MEM.S
- ; Version: 1.004
-
- ; Assembly Instructions:
-
- ; Assemble in PC-relative mode save with a TOS extension.
-
- ; Execution Instructions:
-
- ; Execute from the desktop; or execute SPAWN.TTP, type CLR_MEM.TOS on
- ; its command line and view this program's output in CLR_MEM.DAT.
-
- ; Program Function:
-
- ; Expands the concepts established with program PUSHZERO. Here,
- ; simulating the type of algorithm that would be used to clear video screen
- ; memory, the execution time when clearing memory a longword each time,
- ; within a loop, is compared to that of the more obvious method of clearing
- ; a byte, or perhaps a word, each time through the loop.
-
- ; Then, the speed advantage of clearing a longword by storing the
- ; content of a cleared register is illustrated.
-
- ; Finally, the advantage of clearing more than one longword within the
- ; body of the loop is explored, with an eye out for the maximum amount of
- ; beneficial loop expansion.
-
- calculate_program_size:
- lea -$102(pc), a1 ; Fetch basepage start address.
- lea array, a0 ; Fetch program end = array address.
- adda.l #32000, a0 ; Add in array space.
- movea.l a0, a7 ; Point A7 to this program's stack.
- trap #6 ; Return unused memory to op system.
-
- print_heading:
- lea heading, a0
- bsr print_string
-
- clear_one_byte_algorithm:
- lea header_1, a0
- bsr print_string
- lea array, a0 ; A0 is pointer to 32000 byte array.
- move.l #31999, d7 ; D7 is counter for the clear loop.
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- clear_a_byte:
- move.b #0, (a0)+
- dbra d7, clear_a_byte ; Loop 32000 times.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
-
- clear_one_word_algorithm:
- lea header_2, a0
- bsr print_string
- lea array, a0 ; A0 is pointer to 32000 byte array.
- move.l #15999, d7 ; D7 is counter for the clear loop.
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- clear_a_word:
- move.w #0, (a0)+
- dbra d7, clear_a_word ; Loop 16000 times.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
-
- clear_one_longword_algorithm:
- lea header_3, a0
- bsr print_string
- lea array, a0
- move.l #7999, d7
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- clear_a_longword:
- move.l #0, (a0)+
- dbra d7, clear_a_longword; Loop 8000 times.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
-
- clear_one_longword_with_precleared_register:
- lea header_4, a0
- bsr print_string
- lea array, a0
- moveq #0, d1 ; Preclear D1 for use in the loop.
- move.l #7999, d7
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- clear_with_register:
- move.l d1, (a0)+ ; Clear a longword.
- dbra d7, clear_with_register
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
-
- clear_4_longwords_algorithm:
- lea header_5, a0
- bsr.s print_string
- lea array, a0
- move.l #0, d1 ; Preclear D1 for use in the loop.
- move.l #1999, d7
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- clr_4_longwords:
- move.l d1, (a0)+ ; A single move statement clears 4 bytes.
- move.l d1, (a0)+ ; Reduces number of loops to 4000.
- move.l d1, (a0)+ ; Reduces number of loops to 2666+.
- move.l d1, (a0)+ ; Reduces number of loops to 2000.
- dbra d7, clr_4_longwords ; Loop 2000 times, clear 16 bytes each time.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
-
- clear_8_longwords_algorithm: ; Will a further increase in move instructions
- lea header_6, a0 ; within the loop decrease execution time?
- bsr.s print_string
- lea array, a0
- move.l #0, d1 ; Preclear D1 for use in the loop.
- move.l #999, d7
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- clr_8_longwords:
- move.l d1, (a0)+ ; A single move statement clears 4 bytes.
- move.l d1, (a0)+ ; Reduces number of loops to 4000.
- move.l d1, (a0)+ ; Reduces number of loops to 2666+.
- move.l d1, (a0)+ ; Reduces number of loops to 2000.
- move.l d1, (a0)+ ; Reduces number of loops to 1600.
- move.l d1, (a0)+ ; Reduces number of loops to 1333+.
- move.l d1, (a0)+ ; Reduces number of loops to 1142+.
- move.l d1, (a0)+ ; Reduces number of loops to 1000.
- dbra d7, clr_8_longwords ; Loop 1000 times, clear 32 bytes each time.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
-
- terminate:
- trap #8
-
- ;
- ; SUBROUTINES
- ;
-
- print_string: ; Expects address of string to be in A0.
- move.l a0, -(sp) ; Push address of string onto stack.
- move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
- trap #1 ; GEMDOS call
- addq.l #6, sp ; Reset stack pointer to top of stack.
- rts
-
- data
- heading: dc.b 'CLR_MEM Execution Results',$D,$A,0
- header_1: dc.b ' Clear one byte time: ',0
- header_2: dc.b ' Clear one word time: ',0
- header_3: dc.b ' Clear one longword time: ',0
- header_4: dc.b ' With register time: ',0
- header_5: dc.b ' Clear 4 longwords time: ',0
- header_6: dc.b ' Clear 8 longwords time: ',0
- bss
- align
- start_time: ds.l 1
- ds.l 96 ; Stack.
- stack: ds.l 1 ; Address of stack.
- array: ds.l 0
- end
-
- CLR_MEM Execution Results
- Clear one byte time: 100 milliseconds
- Clear one word time: 45 milliseconds
- Clear one longword time: 35 milliseconds
- With register time: 25 milliseconds
- Clear 4 longwords time: 15 milliseconds
- Clear 8 longwords time: 10 milliseconds
-
-
- If you assemble the program and repeatedly execute it,
- you will occasionally obtain 95 msec for the one byte time,
- 50 msec for the one word time, 15 msec for the 8 longwords
- time and etc. You know that variances are possible because
- of your experience with program 28, so don't let the
- variances bother you. The important things to realize are
- these:
-
- 1. Clearing the 32000 bytes a word at a time is just
- twice as fast as clearing it a byte at a time. This
- is so only because we were able to reduce the number
- of times through the loop by half. The amount of
- time to clear one word of memory is exactly equal to
- the time it takes to clear one byte.
-
- 2. The time it took to clear the 32000 bytes a longword
- at a time is not one/fourth the time it took to
- clear a byte, even though the number of loops was
- reduced by four, because it takes more time to clear
- a longword of memory than it does to clear a byte or
- a word.
-
- 3. Clearing a register to zero before we enter the
- loop, then storing the content of the register is
- faster than using a move.l #0 instruction within the
- loop.
-
- 4. Storing more than one longword within the loop
- reduced the loop execution time significantly.
- However, the times reported for the
- clear_four_longwords and clear_eight_longwords
- algorithms are too close to the resolution of the
- time recording functions; therefore, the
- significance of the reported values is questionable.
-
- Even though the results obtained for the longword
- algorithms mentioned in item four are not entirely reliable,
- they seem to suggest that, as the number of instructions
- within the loop is increased, we will begin to notice a
- saturation effect; that is, a reduction in the number of
- loops will affect the overall loop execution time to a
- lesser degree because of the amount of time spent within the
- loop.
- After I had seen the results of CLR_MEM, I could have
- rewritten the program so that it would produce more
- palatable results, however, if I had done that, I would have
- denied you the experience of viewing a program which
- produces valid results for only a portion of its execution.
- You would not know that I had not been completely successful
- the first time.
- In doing something to validate the results for the last
- two algorithms of program 29, one would be easily tempted to
- simply increase the number of times through each loop in the
- program by an identical factor, say 10, perhaps. In fact,
- it is the first method of improvement that occurred to me.
- When it didn't work, I was forced to examine the details of
- the DBRA instruction, something I had not done in a while.
- If you refer back to program 29, you can see that, if
- we increase the size of memory to be cleared by a factor of
- 10, we must clear 320,000 bytes in the clear_one_byte
- algorithm. To do that, we must loop 320,000 times,
- therefore, the value 319,999 must be prestored in the loop
- counter. What is wrong with this picture? The DBRA
- instruction (as does all of the DBcc instructions) permits
- its counter parameter to contain only a 16-bit value.
- The maximum count we can fit in 16 bits is 65535. So
- we see why we can't set the counter to 320,000 and expect
- valid results. Similarly, we know that a value of 159,999
- for the clear_one_word algorithm will not work, nor will a
- value of 79,999 work for the clear_one_longword algorithm.
- In order to rearrange CLR_MEM so that the results for each
- algorithm would be comparable, we would have to resort to a
- minor loop inside of a major loop. Preparing CLR_MEM2, by
- extracting the necessary portions from PRGM_1F and
- increasing the appropriate values, is a more elegant
- solution. And, while we are doing that, we may as well
- include an algorithm that will let us obtain another
- saturation effect observation point.
-
- Program 30. Measures the time to clear memory with greater
- accuracy.
-
- ; Program Name: CLR_MEM2.S
- ; Version: 1.002
-
- ; Assembly Instructions:
-
- ; Assemble in PC-relative mode and save with a TOS extension.
-
- ; Execution Instructions:
-
- ; Execute SPAWN.TTP and type CLR_MEM2.TOS on its command line. View the
- ; output of this program in CLR_MEM2.DAT.
-
- ; Program Function:
-
- ; The four_longwords_time and the eight_longwords_time reported by program
- ; CLR_MEM are too short to be reliable. Both values are too close to
- ; the resolution of the time recording functions.
-
- ; In this program, the significance of the reported execution times is
- ; increased by increasing the number of loops for those two algorithms
- ; by a factor of 10. This will permit a more valid comparison between the
- ; two algorithms.
-
- ; In addition, since I am going through the trouble of preparing this
- ; special program, I include one more algorithm. This one to investigate
- ; the intermediate five_longwords_time.
-
- ; Why go through this trouble? Well, as the number of instructions within
- ; the loop increases, we should observe a saturation effect in execution
- ; speed reduction, even though the number of loops are decreased, simply
- ; because more time is spent inside the loop. This additional point of
- ; observation will help to confirm or dismiss that theory.
-
- ; Now, because we are increasing the number of loops by 10 in this program,
- ; if we want to compare the results we obtain with those obtained from
- ; program CLR_MEM, we will have to divide CLR_MEM2's results by 10.
-
- calculate_program_size:
- lea -$102(pc), a1 ; Fetch basepage start address.
- lea program_end, a0 ; Fetch program end address.
- adda.l #320000, a0 ; Add in array space.
- movea.l a0, a7 ; Point A7 to this program's stack.
- trap #6 ; Return unused memory to op system.
-
- print_heading:
- lea heading, a0
- bsr print_string
-
- clear_4_longwords_algorithm:
- lea header_1, a0
- bsr print_string
- lea array, a0
- move.l #0, d1 ; Preclear D1 for use in the loop.
- move.l #19999, d7
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- clr_4_longwords:
- move.l d1, (a0)+ ; A single move statement clears 4 bytes.
- move.l d1, (a0)+ ; Reduces number of loops to 40000.
- move.l d1, (a0)+ ; Reduces number of loops to 26666+.
- move.l d1, (a0)+ ; Reduces number of loops to 20000.
- dbra d7, clr_4_longwords ; Loop 20000 times, clear 16 bytes each time.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
-
- clear_5_longwords_algorithm:
- lea header_2, a0
- bsr.s print_string
- lea array, a0
- move.l #0, d1 ; Preclear D1 for use in the loop.
- move.l #15999, d7
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- clr_5_longwords:
- move.l d1, (a0)+ ; A single move statement clears 4 bytes.
- move.l d1, (a0)+ ; Reduces number of loops to 40000.
- move.l d1, (a0)+ ; Reduces number of loops to 26666+.
- move.l d1, (a0)+ ; Reduces number of loops to 20000.
- move.l d1, (a0)+ ; Reduces number of loops to 16000.
- dbra d7, clr_5_longwords ; Loop 16000 times, clear 16 bytes each time.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
-
- clear_8_longwords_algorithm:
- lea header_3, a0
- bsr.s print_string
- lea array, a0
- move.l #0, d1 ; Preclear D1 for use in the loop.
- move.l #9999, d7
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- clr_8_longwords:
- move.l d1, (a0)+ ; A single move statement clears 4 bytes.
- move.l d1, (a0)+ ; Reduces number of loops to 40000.
- move.l d1, (a0)+ ; Reduces number of loops to 26666+.
- move.l d1, (a0)+ ; Reduces number of loops to 20000.
- move.l d1, (a0)+ ; Reduces number of loops to 16000.
- move.l d1, (a0)+ ; Reduces number of loops to 13333+.
- move.l d1, (a0)+ ; Reduces number of loops to 11428+.
- move.l d1, (a0)+ ; Reduces number of loops to 10000.
- dbra d7, clr_8_longwords ; Loop 10000 times, clear 32 bytes each time.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
-
- terminate:
- trap #8
-
- ;
- ; SUBROUTINES
- ;
-
- print_string: ; Expects address of string to be in A0.
- move.l A0, -(sp) ; Push address of string onto stack.
- move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
- trap #1 ; GEMDOS call
- addq.l #6, sp ; Reset stack pointer to top of stack.
- rts
-
- data
- heading: dc.b 'CLR_MEM2 Execution Results',$D,$A,0
- header_1: dc.b ' Clear 4 longwords time: ',0
- header_2: dc.b ' Clear 5 longwords time: ',0
- header_3: dc.b ' Clear 8 longwords time: ',0
- bss
- align ; Align storage on a word boundary.
- ds.l 24 ; Stack.
- stack: ds.l 0 ; Address of stack.
- program_end: ds.l 0 ; Marks the end of program memory.
- array: ds.l 0
- end
-
- CLR_MEM2 Execution Results
- Clear 4 longwords time: 155 milliseconds
- Clear 5 longwords time: 145 milliseconds
- Clear 8 longwords time: 140 milliseconds
-
-
- Program 30's results indicate that those of program 29
- were actually valid enough. For if we divide the values
- reported by CLR_MEM2 by 10, we obtain 15.5, 15.0 and 14.0.
- In addition, this is the kind of data grouping we observe at
- a saturation level. Program 30's report gives us the
- confidence we need to utilize the clear_4_longwords
- algorithm to clear the video screen, knowing that we are
- prudently compromising between execution speed and memory
- requirement.
-
- Unfinished Business: The Morton Conclusion
-
- As promised at the end of chapter 3, I will resume my
- discussion of Mr. Morton's conclusion concerning
- multiplication. Program 31 records the relative speeds of
- the two methods of multiplying a word operand by four;
- however, I have added a little spice because, not only are
- the execution speeds for the two methods identical, but the
- shift method is the faster when multiplying a longword
- operand by four. Furthermore, the shift method requires
- less memory. I should also state that the adding dn to
- itself twice is faster than shifting left twice theory is
- supported in the Kelly-Bootle book on page 202.
-
- Program 31. The refutation.
-
- ; Program Name: TIMES_4.S
- ; Version: 1.005
-
- ; Assembly Instructions:
-
- ; Assemble in PC-relative mode and save with a TOS extension.
-
- ; Execution Instructions:
-
- ; Execute from the desktop; or execute SPAWN.TTP, type TIMES_4.TOS on
- ; its command line and view this program's output in TIMES_4.DAT.
-
- ; Program Function:
-
- ; Compares the speed of multiplying by 4 with add dn,dn to asl #2,dn
- ; to determine which is faster. This experiment attempts to prove or
- ; refute the conclusion advanced by Mike Morton in his magazine article,
- ; "68000 Tricks and Traps", Byte, September, 1986, p163. As it turns out,
- ; when multiplying a word operand by 4, as Mr. Morton specifies in his
- ; article, the execution times are identical. But when multiplying a
- ; longword by 4, the shift algorithm is faster. Furthermore, the shift
- ; algorithm requires less memory.
-
- ; Author Stan Kelly-Bootle also asserts that the add.w dn,dn algorithm
- ; is a faster method of multiplying by 4 than is asl #2,dn at the top of
- ; page 202 of his book "680x0 Programming by Example".
-
- calculate_program_size:
- lea -$102(pc), a1 ; Fetch basepage start address.
- lea program_end, a0 ; Fetch program end address.
- trap #6 ; Return unused memory to op system.
-
- print_heading:
- lea heading, a0
- bsr print_string
-
- word_addition:
- lea header_1, a0
- bsr print_string
- move.l #49999, d7 ; D7 is counter for the loop.
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- word_addition_loop: ; Marks start of instruction in the loop.
- add.w d0, d0 ; To multiply by two.
- add.w d0, d0 ; To multiply by four.
- memory_1: ; Marks end of instruction in the loop.
- dbra d7, word_addition_loop
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
-
- print_word_addition_requisite_memory:
- lea header_5, a0
- bsr print_string
- lea memory_1, a0 ; Calculate number of bytes occupied by
- lea word_addition_loop, a1 ; the instructions in the loop.
- suba.l a1, a0
- bsr print_requisite_memory
-
- word_shift:
- lea header_2, a0
- bsr print_string
- move.l #49999, d7 ; D7 is counter for the loop.
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- word_shift_loop: ; Marks start of instruction in the loop.
- asl.w #2, d0 ; Shift to multiply by 4.
- memory_2: ; Marks end of instruction in the loop.
- dbra d7, word_shift_loop ; Loop 50000 times.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
-
- print_word_shift_requisite_memory:
- lea header_6, a0
- bsr.s print_string
- lea memory_2, a0 ; Calculate number of bytes occupied by the
- lea word_shift_loop, a1 ; instruction in the loop, then store.
- suba.l a1, a0
- bsr.s print_requisite_memory
-
- longword_addition:
- lea header_3, a0
- bsr.s print_string
- move.l #49999, d7 ; D7 is counter for the loop.
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- addition_loop: ; Marks start of instructions in the loop.
- add.l d0, d0 ; To multiply by two.
- add.l d0, d0 ; To multiply by four.
- memory_3: ; Marks end of instruction in the loop.
- dbra d7, addition_loop ; Loop 50000 times.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
-
- print_longword_addition_requisite_memory:
- lea header_7, a0
- bsr.s print_string
- lea memory_3, a0 ; Calculate number of bytes occupied by the
- lea addition_loop, a1 ; instruction in the loop, then store.
- suba.l a1, a0
- bsr.s print_requisite_memory
-
- longword_shift:
- lea header_4, a0
- bsr.s print_string
- move.l #49999, d7 ; D7 is counter for the loop.
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- trap #3 ; Value of system clock returned in D0.
- move.l d0, d1 ; Save in trap call register.
- shift_loop: ; Marks start of instruction in the loop.
- asl.l #2, d0 ; Shift to multiply by 4.
- memory_4: ; Marks end of instruction in the loop.
- dbra d7, shift_loop ; Loop 50000 times.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
-
- print_longword_shift_requisite_memory:
- lea header_8, a0
- bsr.s print_string
- lea memory_4, a0 ; Calculate number of bytes occupied by the
- lea shift_loop, a1 ; instruction in the loop, then store.
- suba.l a1, a0
- bsr.s print_requisite_memory
-
- terminate:
- trap #8
-
- ;
- ; SUBROUTINES
- ;
-
- print_requisite_memory:
- move.l a0, d1
- trap #4
- bsr.s print_string
- lea header_9, a0
- bsr.s print_string
- rts
-
- print_string: ; Expects address of string to be in A0.
- move.l a0, -(sp) ; Push address of string onto stack.
- move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
- trap #1 ; GEMDOS call
- addq.l #6, sp ; Reset stack pointer to top of stack.
- rts
-
- data
- heading: dc.b "TIMES_4 Execution Results",$D,$A,$D,$A,0
- header_1: dc.b " Word addition time: ",0
- header_2: dc.b $D,$A," Word shift time: ",0
- header_3: dc.b $D,$A," Longword addition time: ",0
- header_4: dc.b $D,$A," Longword shift time: ",0
- header_5: dc.b " Word addition requisite memory: ",0
- header_6: dc.b " Word shift requisite memory: ",0
- header_7: dc.b " Longword add requisite memory: ",0
- header_8: dc.b " Longword shift requisite memory: ",0
- header_9: dc.b " bytes",$D,$A,0
- bss
- align
- program_end: ds.l 0
- end
-
-
- TIMES_4 Execution Results
-
- Word addition time: 130 milliseconds
- Word addition requisite memory: 4 bytes
-
- Word shift time: 125 milliseconds
- Word shift requisite memory: 2 bytes
-
- Longword addition time: 180 milliseconds
- Longword add requisite memory: 4 bytes
-
- Longword shift time: 155 milliseconds
- Longword shift requisite memory: 2 bytes
-
- SPEEDTST.TTP Execution Results
- for TIMES_4.TOS, loaded from drive: G
-
- Load time: 50 milliseconds
- Execution time: 925 milliseconds
-
-
- Now that the subject of multiplication algorithms has
- been introduced, it seems appropriate to present a program
- which compares the MC68000 multiply instruction to
- algorithms that accomplish the same task. That this seems
- appropriate might seem enigmatic at first because
- multiplication instructions are part of the MC68000's
- repertoire. Nevertheless, as is indicated in Mr. Morton's
- article, multiplication algorithms which ignore those
- instructions are faster for certain multipliers even if they
- are not powers of 2.
- Program 32 illustrates this fact for multiplication by
- 5, a multiplier that is used in many of my programs to
- convert the system clock count to milliseconds.
- MULTIPLY.TOS compares three methods of multiplying by 5.
- The program can be executed from the desktop or by typing
- its name on the command line of SPAWN.TTP or SPEEDTST.TTP.
-
- Program 32. Compares the speed of three multiplication
- algorithms.
-
- ; Program Name: MULTIPLY.S
- ; Version: 1.002
-
- ; Assembly Instructions:
-
- ; Assemble in PC-relative mode and save with a TOS extension.
-
- ; Execution Instructions:
-
- ; Execute from the desktop; or execute SPAWN.TTP, type MULTIPLY.TOS on
- ; its command line and view this program's output in MULTIPLY.DAT.
-
- ; Program Function:
-
- ; Measures the speed of multiplication algorithms.
-
- calculate_program_size:
- lea -$102(pc), a1 ; Fetch basepage start address.
- lea program_end, a0 ; Fetch program end address.
- trap #6 ; Return unused memory to op system.
-
- print_heading:
- lea heading, a0
- bsr print_string
-
- the_mulu_instruction:
- lea header_1, a0
- bsr print_string
- move.l #49999, d7 ; D7 is counter for the loop.
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- mulu_loop: ; Marks start of instruction in the loop.
- mulu #5, d0 ; Instruction in the loop.
- memory_1: ; Marks end of instruction in the loop.
- dbra d7, mulu_loop ; Loop 50000 times.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
-
- addition:
- lea header_2, a0
- bsr.s print_string
- move.l #49999, d7 ; D7 is counter for the push loop.
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- addition_loop: ; Marks start of instruction in the loop.
- move.l d0, d2 ; To add one.
- add.l d0, d0 ; To double to two.
- add.l d0, d0 ; To double to four.
- add.l d2, d0 ; To complete multiplication by 5.
- memory_2: ; Marks end of instruction in the loop.
- dbra d7, addition_loop ; Loop 50000 times.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
-
- shift_and_add:
- lea header_3, a0
- bsr.s print_string
- move.l #49999, d7 ; D7 is counter for the push loop.
- trap #3 ; Fetch start time.
- move.l d0, d3 ; Save start_time in D3.
- shift_loop: ; Marks start of instruction in the loop.
- move.l d0, d2 ; Save a copy to add.
- asl.l #2, d0 ; Shift to multiply by 4.
- add.l d2, d0 ; To complete multiplication by 5.
- memory_3: ; Marks end of instruction in the loop.
- dbra d7, shift_loop ; Loop 50000 times.
- trap #3 ; Fetch end time.
- sub.l d3, d0 ; Subtract start time from end time.
- trap #10 ; Convert to decimal milliseconds and print.
-
- print__mulu_requisite_memory:
- lea header_4, a0
- bsr.s print_string
- lea memory_1, a0 ; Calculate number of bytes occupied by the
- lea mulu_loop, a1 ; instruction in the loop.
- bsr.s print_requisite_memory
-
- print_addition_requisite_memory:
- lea header_5, a0
- bsr.s print_string
- lea memory_2, a0 ; Calculate number of bytes occupied by the
- lea addition_loop, a1 ; instruction in the loop.
- bsr.s print_requisite_memory
-
- printS_shift_and_add_requisite_memory:
- lea header_6, a0
- bsr.s print_string
- lea memory_3, a0 ; Calculate number of bytes occupied by the
- lea shift_loop, a1 ; instruction in the loop.
- bsr.s print_requisite_memory
-
- terminate:
- trap #8
-
- ; SUBROUTINES
-
- print_requisite_memory:
- suba.l a1, a0
- move.l a0, d1
- trap #4 ; Returns address of decimal string in A0.
- bsr.s print_string
- lea header_7, a0
- bsr.s print_string
- rts
-
- print_string: ; Expects address of string to be in A0.
- pea (a0) ; Push address of string onto stack.
- move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
- trap #1 ; GEMDOS call
- addq.l #6, sp ; Reset stack pointer to top of stack.
- rts
-
- data
- heading: dc.b "MULTIPLY Execution Results",$D,$A,$D,$A,0
- header_1: dc.b " MULU time: ",0
- header_2: dc.b " Addition time: ",0
- header_3: dc.b " Shift and add time: ",0
- header_4: dc.b $D,$A," MULU requisite memory: ",0
- header_5: dc.b " Addition requisite memory: ",0
- header_6: dc.b " Shift and add requisite memory: ",0
- header_7: dc.b " bytes",$D,$A,0
- bss
- align
- program_end: ds.l 0
- end
-
- MULTIPLY Execution Results
-
- MULU time: 360 milliseconds
- Addition time: 260 milliseconds
- Shift and add time: 230 milliseconds
-
- MULU requisite memory: 4 bytes
- Addition requisite memory: 8 bytes
- Shift and add requisite memory: 6 bytes
-
-
- The results clearly show that the shift and add
- algorithm, in this case, is superior. It is 1.5 times
- faster than the MULU instruction, while requiring only 2
- additional bytes of memory. On page 168 of his article, Mr.
- Morton shows an example of multiplication by 17.
-
- LEA Versus ADDA For Stack Adjustments
-
- On page 166 of the same Morton article is a nice tip
- concerning stack pointer movement. Mr. Morton used so few
- words to describe this hint that it is easy to slide right
- by it while reading the article. Program 33 explores this
- subject in a bit more detail, confirming that adjustments to
- the stack using the LEA instruction is twice as fast as
- adjustments using the ADDA instruction. The reason that
- this comparison is of concern is that the ADDQ instruction
- can be used for stack adjustments only when the alteration
- is limited to a maximum of 8 bytes. Beyond that, the ADDA
- instruction is mandatory.
- Within the documentation of program 33, I have included
- a timing note concerning the timing method I have been using
- and its relationship to the instruction execution times
- listed in the Motorola manual. With suitable precautions,
- you may be able to used the information there to generate
- timing data that more closely conforms to the Motorola data
- if you should find it necessary or desirable.
-
- Program 33. Confirms that the LEA instruction is the better
- choice for stack adjustments greater than 8 bytes.
-
- ; Program Name: LEA_ADDA.S
- ; Version 1.004
-
- ; Assembly Instructions:
-
- ; Assemble in PC-relative mode and save with a TOS extension.
-
- ; Execution Instructions:
-
- ; Execute from the desktop; or execute SPAWN.TTP, type LEA_ADDA.TOS on
- ; its command line and view this program's output in LEA_ADDA.DAT.
-
- ; Program Function:
-
- ; Confirms (or refutes) the contention that lea $C(An), An is faster
- ; than adda.l #$C(An), where n = 1 - 7. Reference: "68000 Tricks and Traps",
- ; by Mike Morton, p.166, Byte, September, 1986. See the paragraph which
- ; begins "Small adjustments to the stack pointer...".
-
- calculate_program_size:
- lea -$102(pc), a1 ; Fetch basepage start address.
- lea program_end, a0 ; Fetch program end address.
- trap #6 ; Return unused memory to op system.
- lea stack, a7 ; Point A7 to this program's stack.
-
- print_heading:
- lea heading, a0
- bsr print_string
-
- lea_method:
- lea lea_time_msg, a0 ; Print label for lea execution results.
- bsr print_string
- move.l #49999, d7 ; D7 is counter for the loop.
- trap #3 ; Value of system clock returned in D0.
- move.l d0, d1 ; Save time in scratch register.
-
- lea_loop: ; Marks start of instruction in the loop.
- lea $C(a2), a2 ; Instruction in the loop.
- memory_1: ; Marks end of instruction in the loop.
- dbra d7, lea_loop ; Loop 50000 times.
- trap #3 ; Get current value of system clock.
- sub.l d1, d0 ; Subtract beginning value from ending value.
- mulu #5, d0 ; Convert to milliseconds.
- sub.l #80, d0 ; Subtract dbra time and "error".
- ; See Timing Note below.
- move.l d0, d1 ; Transfer time for trap #4 call.
- convert_lea_time_to_ASCII_decimal:
- trap #4 ; Address of decimal string returned in A0.
- print_lea_time:
- bsr.s print_string ; Print the decimal string.
- lea time_msg, a0 ; Print units label.
- bsr.s print_string
-
- ; Timing Note:
-
- ; Until the count has expired, each dbra instruction requires 10 clock
- ; periods to execute. 49,999 dbra instructions require that amount of
- ; time. When the last dbra instruction is executed, the count has expired
- ; and that instruction requires 14 clock periods.
-
- ; Total clock periods consumed by the dbra instruction is
-
- ; 49,999 X 10 + 14 = 500,004.
-
- ; Each clock period is .000000125 second. Total time consumed by the
- ; dbra instructions is
-
- ; 500,004 X .000000125 = .0625 second = 62.5 milliseconds.
-
- ; If we assume that the 8 clock period execution time given in the
- ; Motorola reference manual for the lea d(An) instruction is correct,
- ; then 50,000 executions of lea $C(a2), a2 require
-
- ; 50,000 X 8 X .000000125 = 50 milliseconds
-
- ; Total time for the lea loop is 62.5 + 50 = 112.5 milliseconds.
-
- ; When no adjustment is made for "overhead" time, such as attendant
- ; instructions, which includes the dbra instruction, trap invocations and
- ; a few others, this program reports a lea loop execution time of 130
- ; milliseconds. We can assume that the excess 17.5 milliseconds (130-112.5)
- ; is partially due to the "overhead" time and partially due to a system
- ; clock frequency different from 8 mhz. We can simply combine both
- ; components into "overhead" time.
-
- ; With no adjustment for "overhead" time this program reports a loop
- ; execution time of 180 milliseconds for the adda.l #$C, a2 loop. We
- ; subtract the 17.5 milliseconds "overhead" from this to obtain the true
- ; loop time of 162.5 milliseconds.
-
- ; From this we subtract the dbra execution time of 62.5 milliseconds to
- ; get 100 milliseconds for 50,000 adda.l instruction..
-
- ; 62.5 milliseconds / 50,000 = .000002 second per instruction.
-
- ; .000002 / .000000125 = 16 clock periods per instruction
-
- ; The execution time given in the Motorola manual for the adda.l #d, An
- ; instruction is 16 clock periods.
-
- ; These calculations validate the method and justify subtracting the
- ; "overhead" time from the total loop execution times before printing the
- ; execution time for 50,000 executions of each of the two instructions of
- ; interest. Finally, we can combine dbra time and "overhead" time into
- ; a "new" overhead time that is 17.5 + 62.5 = 80 milliseconds.
-
- ; When the adjustments are made to the recorded loop times, the program
- ; prints out the correct time for 50,000 executions of each instruction,
- ; and the output correctly indicates that the lea instruction used in
- ; the program is twice as fast as the adda.l instruction used. The
- ; execution results of the program agree with the timing data given in
- ; the Motorola reference guide.
-
- adda_method:
- lea adda_time_msg, a0 ; Print label for adda execution results.
- bsr.s print_string
- move.l #49999, d7 ; D7 is counter for the loop.
- trap #3 ; Value of system clock returned in D0.
- move.l d0, d1 ; Save time in scratch register.
-
- adda_loop: ; Marks start of instruction in the loop.
- adda.l #$C, a2 ; Instruction in the loop.
- memory_2: ; Marks end of instruction in the loop.
- dbra d7, adda_loop ; Loop 50000 times.
- trap #3 ; Get current value of system clock.
- sub.l d1, d0 ; Subtract beginning value from ending value.
- mulu #5, d0 ; Convert to milliseconds.
- sub.l #80, d0 ; Subtract dbra time and "error".
- move.l d0, d1 ; Transfer time for trap #4 call.
- convert_adda_time_to_ASCII_decimal:
- trap #4 ; Address of decimal string returned in A0.
- print_adda_time:
- bsr.s print_string ; Print the decimal string.
- lea time_msg, a0 ; Print units label.
- bsr.s print_string
-
- print_lea_requisite_memory:
- lea lea_memory_msg, a0
- bsr.s print_string
- lea memory_1, a0 ; Calculate number of bytes occupied by the
- lea lea_loop, a1 ; instruction in the loop.
- bsr.s print_requisite_memory
-
- print_adda_requisite_memory:
- lea adda_memory_msg, a0
- bsr.s print_string
- lea memory_2, a0 ; Calculate number of bytes occupied by the
- lea adda_loop, a1 ; instruction in the loop.
- bsr.s print_requisite_memory
-
- terminate:
- trap #8
-
- ; SUBROUTINES
-
- print_requisite_memory:
- suba.l a1, a0
- move.l a0, d1
- trap #4 ; Returns address of decimal string in A0.
- bsr.s print_string
- lea memory_msg, a0
- bsr.s print_string
- rts
-
- print_string: ; Expects address of string to be in A0.
- pea (a0) ; Push address of string onto stack.
- move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
- trap #1 ; GEMDOS call
- addq.l #6, sp ; Reset stack pointer to top of stack.
- rts
-
- data
- heading: dc.b $D,$A,'LEA_ADDA Execution Results',$D,$A,$D,$A,0
- lea_time_msg: dc.b ' Time for 50,000 lea instructions: ',0
- adda_time_msg: dc.b ' Time for 50,000 adda.l instructions: ',0
- time_msg: dc.b ' milliseconds',$D,$A,0
- lea_memory_msg: dc.b $D,$A,' LEA requisite memory: ',0
- adda_memory_msg: dc.b ' ADDA.L requisite memory: ',0
- memory_msg: dc.b ' bytes',$D,$A,0
- bss
- align ; Align storage on a word boundary.
- ds.l 96
- stack: ds.l 0
- program_end: ds.l 0 ; Marks the end of program memory.
- end
-
- LEA_ADDA Execution Results
-
- Time for 50,000 lea instructions: 50 milliseconds
- Time for 50,000 adda.l instructions: 100 milliseconds
-
- LEA requisite memory: 4 bytes
- ADDA.L requisite memory: 6 bytes
-
-
- The programs that I have used to compare the execution
- speeds and requisite memory of specific instructions and
- algorithms should have, by now, revealed their similarity of
- structure. It is appropriate that I present a more general
- algorithm which can be used to perform such comparisons. By
- endowing this algorithm with a certain amount of
- intelligence, I will be able to reduce the size of the
- programs that compare algorithms, and, in addition, I will
- be able to introduce the subject of algorithmic
- intelligence, itself.
- At first, my inclination was to include the general
- purpose performance testing algorithm in this chapter, but
- the detail with which I intend to discuss it and its
- ramifications would force this chapter to become very long
- and unwieldy, therefore, I have decided to devote an
- exclusive chapter to the algorithm. The subject of the next
- chapter, then, is the design of an algorithm that is smart
- enough to accept a variety of instructions or other
- algorithms as input and to generate suitable data for
- execution speed and requisite memory comparisons.
- I suggest that, if at all possible, you read The Micro
- Millennium, by Christopher Evans, first published in 1980 by
- The Viking Press, before you turn to chapter 7. At the very
- least, I think that you should read Chapters 12 through 14
- which discuss the following subjects: The Nature of
- Intelligence, Can a Machine Think? and Towards the Ultra
- Intelligent Machine. If you follow this advice, then you
- will probably feel more comfortable with my use of the word
- intelligence as it applies to computers and computer
- algorithms.
-
- Conclusion
-
- The emphasis in this chapter, as it has been in
- previous chapters and as it will continue to be in future
- chapters, is accuracy, reliability, execution speed and
- minimum requisite memory. I feel that it is via a constant
- pressure to maintain these algorithmic attributes that all
- computer programming goals are eventually achieved. In
- chapter 7, I shall illustrate the manner in which the desire
- to maintain these attributes leads to the development of
- algorithms which assume an intelligence of their own,
- thereby violating the computers are dumb principle.
-
-