home *** CD-ROM | disk | FTP | other *** search
- Short: Patch CopyMem/Quick for 68060(040) v1.5d
- Author: sintonen@iki.fi (Harry "Piru" Sintonen)
- Uploader: sintonen@iki.fi (Harry "Piru" Sintonen)
- Version: 1.5d
- Type: util/boot
- Requires: 68060 or 68040, Kickstart 2.04
-
- Description:
- This is a small patch which replace the CopyMem and CopyMemQuick
- functions of exec.library.
-
- These functions are optimized for the 68060 processor. They should
- also work with the 68040 processor, howevery they might not be the
- fastest possible for 68040.
-
- The patch tests for a 68040 or 060 processor. If it can't find one,
- it doesn't install the patch and exit with a return code of 20 (=fail).
- It also fails, if it can't allocate the necessary memory. If MorphOS
- PPC kernel is running it won't install the patch and will exit with a
- return code of 5 (=warn).
-
- If the CPU is a 68040 CMQ060 will install a slightly improved version
- of v1.4 routines. If CPU is a 68060 routines with new movem-loop are
- picked instead. Note that due these movem-copyloops v1.5 is slightly
- slower in chipmem copies than v1.4. However fast->fast copies are sped
- up, so I don't consider this a problem, esp. since most copies are
- fast->fast.
-
- In average (measured with "TestIt" from CopyMemQuicker V2.8) these
- routines are 29.4% faster than Kickstart 3.1 ones. CMQ060 v1.5 is
- in average 2.5% faster than CMQ060 v1.4.
-
- The full source code is included. The source code was compiled with
- GenAm 3.14, it also compiles with PhxAss.
-
- Installation:
- Just copy CMQ060 into c:
- And insert CMQ060 in your s:startup-sequence
-
- Some notes about Move16:
- Move16 is a new assembler command of the 68040 and 060 processors. It
- moves 16 bytes at once and it uses burst accesses. Andreas Kleinert and
- Thomas Richter said there could be problems with the Move16 command on
- the Amiga, especially in the chipram, caused by the DMA of the custom
- chips.
-
- So v1.5 of CMQ060 doesn't use Move16 from or into memory below
- $01000000 (Chipram, ZorroII-Fastram, I/O-Space, Kickstart,...). Move16
- is only used, when the source and destination addresses are both higher
- than $00ffffff (32-bit fastram).
-
- (If you didn't get any errors with V1.3 and want to get the most speed
- improvement, you could use CMQ060_Move16. This version use Move16 also
- below $01000000, but you might get problems.
-
- If you want to avoid all problems which Move16 could cause [the 68040
- has some Move16 bugs], you should use Aminet:util/boot/CMQ030. This
- one never uses Move16 and is still faster than the other available
- patches.)
-
- Some notes about the movem bug:
- Some CPU Cards have a bug in the bus controller and these cards fail to
- perform movem properly with odd addresses. CMQ060 v1.5 autodetect such
- cards and will use move-loop instead of movem-loop with them. If move-
- loop is picked the performance will drop slightly compared to movem-
- loop. Fortunately such defect cards are rare. Special thanks to Harald
- Frank who patiently explained the bug to me, and gave me idea how to
- autodetect it.
-
- Version 1.5 author:
- Harry "Piru" Sintonen
- <sintonen@iki.fi>
-
- Original CMQ060 author:
- Dirk Busse
- Kropsburgstraße 8
- D-67141 Neuhofen
- Germany
- <dbusse@primus-online.de>
- <100.141999@germanynet.de>
-
- Speed comparision:
- There are some similar patches available on the Aminet:
- CopyMemQuicker V2.8 from 1994 -> Aminet:util/boot/COPMQR28.lha
- PCM V1.0 from 1996 -> Aminet:util/boot/PCM_1.0.lha
- Also MCP patches these functions.
-
- Here are some test results. All results were measured on the same AMIGA
- 1200 with a phase5 Blizzard PPC with 060 @ 50MHz. Blizzard PPC memory
- speed setting for M68K was set to fastest possible.
-
- The most surprising result is that PCM V1.0 is in average *slower* than
- original Kickstart 3.1 routines!
-
- "TestIt" from
- CopyMemQuicker V2.8
- orig COPMQR MCP PCM CMQ030 CMQ060 CMQ060 CMQ060
- KS 3.1 V2.8 V1.33b1 V1.0 V1.1 V1.4 V1.5 Move16
- CopyMem routines V1.5
- 565×64kB L->L 2.04 2.08 1.92 1.56 1.91 1.52 1.51 1.51
- 147×64kB L->L+1 0.94 0.68 0.57 0.68 0.56 0.57 0.56 0.56
- 413×64kB L->E 1.66 1.70 1.61 1.91 1.57 1.61 1.59 1.59
- 147×64kB L->E+1 0.94 0.68 0.57 0.68 0.56 0.57 0.56 0.56
- 147×64kB L+1->L 0.94 0.67 0.57 0.60 0.56 0.57 0.55 0.56
- 382×64kB L+1->L+1 1.62 1.39 1.29 1.05 1.30 1.03 1.02 1.02
- 147×64kB L+1->E 0.94 0.68 0.57 0.69 0.57 0.57 0.56 0.56
- 501×64kB L+1->E+1 1.91 1.89 1.95 2.34 1.96 1.96 1.93 1.93
- 501×64kB E->L 1.92 1.92 1.94 2.06 1.92 1.95 1.90 1.90
- 147×64kB E->L+1 0.94 0.67 0.57 0.68 0.57 0.57 0.55 0.55
- 382×64kB E->E 1.62 1.39 1.29 1.06 1.30 1.03 1.02 1.02
- 147×64kB E->E+1 0.94 0.68 0.57 0.68 0.57 0.57 0.56 0.56
- 147×64kB E+1->L 0.94 0.67 0.57 0.60 0.56 0.57 0.55 0.56
- 413×64kB E+1->L+1 1.71 1.70 1.60 1.93 1.61 1.60 1.56 1.56
- 147×64kB E+1->E 0.94 0.67 0.57 0.69 0.57 0.57 0.55 0.55
- 564×64kB E+1->E+1 2.10 2.06 1.91 1.56 1.92 1.52 1.50 1.50
- 33900×1kB L->L 0.43 0.42 0.37 1.49 0.36 0.36 0.36 0.36
- 9400×1kB L->L+1 0.58 0.33 0.20 0.24 0.20 0.19 0.19 0.19
- 24000×1kB E->E 0.68 0.30 0.26 1.01 0.27 0.26 0.26 0.26
- 196000×128B L->L 0.55 0.45 0.41 1.12 0.32 0.35 0.33 0.33
- 155000×128B E->E 0.75 0.40 0.34 1.10 0.34 0.30 0.30 0.31
- 588000×19B L->L 0.85 0.61 0.72 0.93 0.53 0.53 0.53 0.53
- 622000×18B L->L 0.86 0.51 0.71 0.89 0.51 0.50 0.50 0.51
- 663000×17B L->L 0.75 0.68 0.76 0.80 0.51 0.53 0.53 0.55
- 956000×16B L->L 0.82 0.71 1.04 1.05 0.59 0.72 0.55 0.55
- 1060000×8B L->L 0.85 0.72 0.89 1.03 0.62 0.53 0.55 0.55
- 1430000×4B L->L 0.80 0.63 0.94 1.12 0.71 0.45 0.45 0.48
- 2190000×1B L->L 0.74 0.61 1.40 0.88 0.44 0.66 0.66 0.70
- CopyMemQuick
- 565×64kB L->L 2.04 2.06 1.91 1.56 1.91 1.52 1.51 1.51
- 33900×1kB L->L 0.43 0.43 0.37 1.27 0.36 0.36 0.35 0.35
- 196000×128B L->L 0.53 0.43 0.38 1.09 0.31 0.32 0.30 0.30
- 956000×16B L->L 0.73 0.63 0.94 1.06 0.42 0.58 0.42 0.42
- 1060000×8B L->L 0.53 0.57 0.80 0.63 0.44 0.42 0.42 0.42
- 1430000×4B L->L 0.43 0.51 0.80 0.60 0.31 0.28 0.28 0.31
- Total
- 35.63 30.70 31.48 36.84 27.31 25.80 25.16 25.31
-
- History:
- 1.0 (12.Sep.1998)
- - First public version.
- 1.1 (15.Sep.1998)
- - V1.0 exits with a return code of 10 (=error), if it can't find
- a 68040 or 68060 or can't get the necessary memory.
- V1.1 exits, in this cases, with a return code of 20 (=fail).
- - Fixed a mistake in the readme.
- 1.1b (19.Sep.1998)
- (I didn't changed the Patch itself! It's the same as V1.1)
- - Added the Testresults of MCP V1.30 into the readme.
- - Added CMQ060beep and CMQ060beepCMQ (see above).
- 1.2 (29.Nov.1998)
- - Added the Testresults of MCP V1.32b12 into the readme.
- - Changed the source code.
- There was a problem with a wrong written program which expects
- the address of the last source byte +1 in A0 and the address
- of the last destination byte +1 in A1.
- This version of CMQ060 solves problems with such badly programs.
- It's now 100 Bytes longer, but the speed is the same. Big moves
- by the CopyMem function will be one or two cycles faster, but
- you didn't recognize it.
- 1.3 (5.Jan.1999)
- All changes made to this version doesn't effect the speed. They
- are only to avoid problems with future versions of AMIGA OS.
- - changed the version string to the "standard" format
- - changed BMI to BCS and BPL to BCC
- -> now CMQ030 could move blocks bigger than 2 GigaByte ;-)
- 1.4 (3.Apr.1999)
- - CMQ060 now doesn't use Move16 into/from memory below $01000000
- - added CMQ060Move16 (It's the same as CMQ060 V1.3)
- - added the test results of CMQ030 (Does never use Move16)
- 1.5 (5.Sep.2000)
- - Totally rewrote the source code.
- - Bugfix: Fixed major bug from the patch init: If the memory was
- allocated near 64k boundary CMQ060 trashed innocent memory and
- crashed the system completely. Odds were 1/8192 for this to
- happen.
- - Speedup: Removed two pipeline stalls from big copies.
- - Speedup: Optimized non-move16 copy loop, now it uses movem.l
- instead of move.l. Slightly slower in chipmem copies, however
- fast -> fast copies sped up.
- - Speedup: Unrolled the bigcopy-loops to do 256 bytes per
- iteration.
- - Added MorphOS check, it makes no sense to slow down MorphOS
- with m68k patches.
- - Redid all speedtests, MCP test with 1.33b1. Added V1.4 result
- for reference. Cleaned up this readme.
- 1.5b (6.Sep.2000)
- - With 68040 the move-loop is faster then movem-loop. So, now
- always pick move-loop for 68040. Thanks to Chip for benchmark
- results.
- - Added autodetect for movem buscontroller bug. Now automagically
- pick between movem- and move-loop on 68060.
- - Fixed Kickstart requirement, 68040 wasn't officially supported
- before Kickstart 2.04.
- 1.5c (7.Sep.2000)
- - Bugfix: movem buscontroller bug autodetect was bugged. Fixed.
- - Made the source compile with PhxAss.
- 1.5d (11.Sep.2000)
- - Bugfix: movem buscontroller bug autodetect still had a potential
- problem. Fixed.
-