|
Volume Number: 14 (1998)
Issue Number: 12
Column Tag: Programmer's Challenge
Dec 98 Programmer's Challenge
by Bob Boonstra, Westford, MA
Word Neighbors
Several of you have written me to point out that the Challenges have been getting more and more difficult over the years. Thinking back over the past few months, where we have solved an orbital mechanics problem, programmed a winning Hearts solution, and written an emulator for the first stored-program computer, I can see where this might be true. This month the problem is easier. Your Challenge is to write a search routine that will find a set of words that are near to one another in some target text. You might imagine the solution being used as part of an internet search engine, providing a capability more selective than looking for all of the target words being present on a page, but less restrictive than requiring the target words to form a phrase. The prototype for the code you should write is:
#if defined(__cplusplus) extern "C" { #endif pascal void Initialize( char *text, /* NULL terminated text to be searched */ long distance, /* max distance between nearby words */ void *privateStorage, /* private storage for your use */ long storageBytes /* number of bytes in privateStorage */ ); pascal long FindNearby( /* return number of matches found */ char *words[], /* words to find in text */ long numWords, /* number of words */ Boolean caseSensitive, /* true if match is case sensitive */ Boolean preserveOrder, /* true if words must be found in order */ long matchPositions[], /* position in text of first word in match */ long maxMatches /* max number of matches to return */ ); #if defined(__cplusplus) } #endif
Your Initialize routine is called to give you an opportunity to preprocess the text to be searched. The text consists of a sequence of words separated by delimiters. For the purpose of this Challenge, a word is a sequence of alphanumeric characters separated by one or more non-alphanumeric characters. The text may include ASCII characters between 0x20 and 0x7E, inclusive, plus carriage returns (0x0D), linefeeds (0x0A), and tabs (0x09). All non-alphanumeric characters are delimiters.
You are given the maximum distance to be allowed between adjacent search words. A distance of 0 corresponds to adjacent words, a distance of 1 corresponds to search words with one intervening word, etc. You are also given a privateStorage pointer to storage available for your use, along with the size (storageBytes) of privateStorage in bytes. There will be at least 16 bytes of storage for each unique (case-sensitive) word in the text, plus 4 bytes for each word occurrence.
For each time that Initialize is called with new text to be searched, your FindNearby routine will be called an average of 10 to 50 times with a set of numWords words to find. You will be told whether the search is to be caseSensitive or not, and whether the words must be found in order (preserveOrder==TRUE) in the text. You must find the first maxMatches occurrences of the words in the text, where a match occurs when each of the words is separated by no more than a maximum number (distance) of intervening words from another of the search words. For each match, you should return in matchPositions the offset in text of the first of the search words found in text. For example, if distance==2 and the text is "The quick brown fox jumps over the lazy dog.", you would return 4 if the words were "brown", "quick", and "jumps", provided preserveOrder is FALSE. No word in the text may be part of more than one match.
The winner will be the solution that correctly finds the word matchPositions in the minimum execution time, including the time taken by the Initialize routine..
This will be a native PowerPC Challenge, using the latest CodeWarrior environment. Solutions may be coded in C, C++, or Pascal.
Three Months Ago Winner
Congratulations once again to Ernst Munter (Kanata, Ontario) for submitting the fastest solution to the September "Big Baby" Challenge. Eleven people submitted entries that simulated the operation of the Manchester Mark 1 prototype computer, the world's first stored program computer, which was fifty years old this past June. Ernst was one of four entries to take advantage of our annual September opportunity to include assembly language in their solution. One of the novel things about the winning solution is the way Ernst executes instructions in pairs, trying to take full advantage of the PowerPC processor. The solution accurately models an unusual overflow characteristic of the original Baby computer, described in the Baby documentation and mentioned in the solution comments. However, because this overflow behavior occurs infrequently, the winning solution detects an overflow in its fast mode of execution and switches to a slower mode of operation to process it.
The problem statement required contestants to treat memory the way it appeared on the Baby CRT display, with the least significant bit on the left. Some of the solutions did this via enormous table lookups, leading to a large memory footprint. Ernst did the bit-flipping in his CopyMem routine using the TWIDDLE macro. He uses one of the PowerPC's most versatile instructions, the Rotate Left Word Immediate Then And With Mask, or rlwinm instruction, to first flip bytes, then 4-bit nibbles, then 2-bit chunks, and finally individual bits.
I used nine test cases to evaluate the Big Baby solutions, including four that executed the Assembler function. While the original Baby computer used only 5 address bits, allowing it to address 32 words of memory, the Big Baby Challenge required Challengers to deal with larger memory spaces, and my test cases used up to 8 address bits. The test cases included two programs demonstrating self-modifying code: a 16-bit counter program submitted by Ernst Munter, and a BabyFill program submitted by Peter Lewis that counts the number of words of memory available. (Both Ernst and Peter earn two points for submitting these programs.) The test cases also included two programs submitted to the University of Manchester Baby contest last June, a prime number generation program, and a "3 minute noodle timer" program that is quite interesting to watch execute on the graphical Baby simulator available at the University of Manchester web site. The 16-bit counter, BabyFill, and prime number programs are reproduced below.
16-bit counter program 32 0000 JMP 18 0001 LDN 20 0002 JRP 19 0003 NUM 536870911 0004 NUM 134217727 0005 NUM 33554431 0006 NUM 8388607 0007 NUM 2097151 0008 NUM 524287 0009 NUM 131071 0010 NUM 32767 0011 NUM 8191 0012 NUM 2047 0013 NUM 511 0014 NUM 127 0015 NUM 31 0016 NUM 7 0017 NUM 1 0018 JMP 21 0019 NUM 23 0020 NUM -24593 0021 JRP 0 0022 LDN 30 0023 STO 30 0024 LDN 30 0025 SUB 17 0026 STO 30 0027 SUB 21 0028 STO 29 0029 NUM 0 0030 NUM 0 0031 CMP BabyFill 16 0000 NUM 0 0001 JRP 4 0002 STO 0 0003 NUM -1 0004 NUM 6 0005 NUM 1 0006 NUM -32771 0007 NUM -14 0008 LDN 14 0009 SUB 5 0010 STO 14 0011 LDN 14 0012 STO 14 0013 LDN 6 0014 STO 15 0015 LDN 7 Prime Number generator 30 0000 JMP 24 0001 LDN 21 0002 STO 21 0003 LDN 21 0004 SUB 15 0005 STO 21 0006 LDN 15 0007 STO 22 0008 LDN 22 0009 STO 22 0010 LDN 22 0011 SUB 15 0012 STO 22 0013 SUB 21 0014 CMP 0015 NUM -1 0016 LDN 21 0017 STO 23 0018 LDN 23 0019 SUB 22 0020 JMP 0 0021 NUM 1 0024 NUM 7 0025 CMP 0026 JRP 0 0027 STO 23 0028 LDN 23 0029 SUB 22 0030 CMP 0031 JMP 20
The table below lists, for each of the solutions submitted, the total execution time, the number of the nine test cases that the entry processed incorrectly, and the execution times of three individual test cases. Test case 3 is the 16-bit counter mentioned above. Case 6 is the prime number generator, which was executed 30 times for each entry, producing the 30th prime number as the final result. Test case 8 was the "3 minute" noodle timer, which executed in 3 minutes on the original Baby, but which would leave our noodles seriously undercooked on my 200MHz 604e. You can see that the solutions rank in the same speed order for each of the individual test cases as they do in the aggregate. The table also includes code size, data size, and programming language used for each solution. As usual, the number in parentheses after the entrant's name is the total number of Challenge points earned in all Challenges prior to this one. Four of the 11 entries crashed or hung my machine; I've replaced the author's name with their initials in those cases.
Name | Time | Errors | Case3 | Case6 | Case8 | Code | Data | Lang. |
Ernst Munter (408) | 41.01 | 0 | 29.67 | 2.99 | 6.56 | 9636 | 104 | C++/asm |
ACC Murphy (34) | 54.46 | 0 | 39.37 | 4.20 | 8.20 | 2640 | 16 | asm |
Rob Shearer | 64.10 | 0 | 46.54 | 5.63 | 8.91 | 2340 | 131246 | C++ |
Rob Managan | 77.85 | 0 | 55.25 | 7.24 | 11.66 | 1940 | 324 | C |
Daniel Harding | 110.10 | 0 | 78.90 | 9.49 | 15.98 | 1424 | 8632 | C++ |
Randy Boring (83) | 69.65 | 1 | 49.87 | 5.38 | 10.20 | 1356 | 1120 | C/asm |
Willeke Rieken (47) | 70.08 | 2 | 51.04 | 5.93 | 10.49 | 1748 | 16 | C |
A. C. | Crash | 892 | 692 | C | ||||
M. P. | Crash | 640 | 131104 | asm | ||||
T. B. | Crash | 2248 | 116 | C | ||||
D. F. | Hang | 1056 | 237 | C |
Top Contestants
Listed here are the Top Contestants for the Programmer's Challenge, including everyone who has accumulated 20 or more points during the past two years. The numbers below include points awarded over the 24 most recent contests, including points earned by this month's entrants.
- Munter, Ernst 206
- Boring, Randy 56
- Mallett, Jeff 50
- Saxton, Tom 49
- Rieken, Willeke 47
- Cooper, Greg 44
- Maurer, Sebastian 40
- Heithcock, JG 37
- Murphy, ACC 34
- Nicolle, Ludovic 34
- Lewis, Peter 31
- Hart, Alan 21
- Antoniewicz, Andy 20
- Day, Mark 20
- Higgins, Charles 20
- Hostetter, Mat 20
- Studer, Thomas 20
There are three ways to earn points: (1) scoring in the top 5 of any Challenge, (2) being the first person to find a bug in a published winning solution or, (3) being the first person to suggest a Challenge that I use. The points you can win are:
- 1st place 20 points
- 2nd place 10 points
- 3rd place 7 points
- 4th place 4 points
- 5th place 2 points
- Finding bug 2 points
- Suggesting Challenge 2 points
Here is Ernst's winning Big Baby solution:
Baby.cp
Copyright © 1998 Ernst Munter
/* Version 3 The Problem ------ Write an assembler and emulator for "Baby" Mark I, the first computer to store a program electronically (back in 1948). Instead of just 32 words of store, "BigBaby" can have up to 1024 words. The emulator should handle all programs in the same way as the M1SIM simulator of the University of Manchester. This includes the peculiar overflow behaviour associated with the CI / PI interaction. Bit flipping ------ The Baby convention of displaying the LSB as bit 0 on the left is also emulated, in effect by reversing the bits in memory before and after execution. This feature can be turned off. The Assembler ------- The assembler is unremarkable. It happens that the opcodes for this small set of instruction mnemonics can be separated by a simple hash function, and then translated to the actual codes through a very small table. This is quicker than a switch statement or a series of compare/branch. The Emulator ------ The emulator is "lazy" about the peculiar CI / PI behaviour which is not expected to come up with most programs. Only enough logic is provided in the fast part of the emulator to be able to detect CI overflow as an "exception". When such an exception occurs, the emulator switches over into a slower but accurate emulation of the particular Baby design, and the program then finishes in that part. Both parts of the emulator are structured around computed jumps to a set of instruction blocks which emulate each Baby opcode. Slower Loop ------ The slower section selects one instruction at a time, and processes it to completion before fetching the next instruction. Faster Loop ------ The faster loop of the emulator attempts to execute two instructions in parallel, thus striving to exploit the multiple integer execution units in the powerPC chip. This results in a 64-way switch, for all combinations in pairs of Baby opcodes. In addition, some instruction pairs lend themselves to interleaving of the execution of the present two instructions with fetching and decoding of the next two. In a previous version, I spread the fetch, decode, and execute of three instructions over three consecutive blocks, thus having effectively 3 instructions in the pipeline. The new version is slightly faster. Wrapping ---- When the end of the defined memory is reached, the program must start again at location 0. In the slower loop (the "standard" implementation of the algorithm), program addresses are derived from CI by ANDing with a mask, and wrapping is automatic. In the faster loop, I use a self-incrementing program counter which directly addresses memory. This incurs some overhead when the end of the defined memory is reached. I have extended the memory (a copy) by two sentinel locations which are filled with STP instructions. When a STP is reached, a quick comparison determines if this is the true stop, or a sentinel. In the latter case, program execution is restarted at the first or second location. PI Addition Mode -------- Under certain conditions, a program instruction is created by the addition of the previous instruction with the present one. This condition (bit 13 set in CI) can only arise at three points: - a JMP (indirect JUMP) - a JRP (relative JUMP) - when wrapping, as CI counts up. All three occasions are already separated in the program, and allow a simple check of the state of CI. When the exceptional case is detected, program flow is directed into the slower loop which deals with the instruction addition as well as the possible "fatal" error, when CI-bits 14 or 15 are set. In that case, the emulator program simply returns. A note on the M1SIM implementation of this: CI is incremented at the start of each instruction, but the check of CI-bit 13 is made before incrementing CI. In this way, it is possible for CI to emerge with bit 13 set, but without instruction addition having taken place (but it likely will happen in the next cycle). Assumptions ------ The constant kMaxInstructions is only used to allocate memory, the memory size used by the Baby emulator algorithm is always memUsed = (1 << address_bits), i.e. all memory address calculations are modulo memUsed. If kMaxInstructions == (1 << address_bits), many PPC instructions can be saved by combining shift and mask in a single opcode (rlwinm) which expects constant parameters. To exploit this, I have provided two complete copies of the Execute(..) function where the correct one is automatically selected. The gain in speed is about 7% when kMaxInstructions matches address_bits. */ # include "Baby.h" /* Two versions of the memory copy function are provided. One which reverses the bit order of each 32-bit word, the other just a normal copy. The pre-processor directive UNUSUAL_BIT_ORDER selects. */ # define BIT_ORDER_REVERSED 1 #if BIT_ORDER_REVERSED static UInt32 RM[4]={ 0xFF00FF00,0xF0F0F0F0,0xCCCCCCCC,0xAAAAAAAA }; // Memory copy while flipping bits static asm void CopyMem( UInt32* dest,UInt32* src,UInt32 num) { # define mask8 r7 # define mask4 r8 # define mask2 r9 # define mask1 r10 lwz r6,RM(RTOC) lwz mask8,0(r6) lwz mask4,4(r6) lwz mask2,8(r6) lwz mask1,12(r6) mtctr r5 @loop: lwz r0,0(r4) addi r4,r4,4 rlwinm r5,r0,16,0,15 rlwinm r6,r0,16,16,31 add r0,r5,r6 # define TWIDDLE(x) \ rlwinm r5,r0,x,0,31-x; \ and r6,mask##x,r0; \ and r5,mask##x,r5; \ rlwinm r6,r6,32-x,x,31; \ add r0,r5,r6; TWIDDLE(8) TWIDDLE(4) TWIDDLE(2) TWIDDLE(1) stw r0,0(r3) addi r3,r3,4 bdnz @loop blr } # else // Simple memory copy static asm void CopyMem( UInt32* dest,UInt32* src,UInt32 num) { rlwinm r0,r5,30,2,31 mtctr r0 @loop: lwz r5,0(r4) lwz r6,4(r4) lwz r7,8(r4) lwz r8,12(r4) addi r4,r4,16 stw r5,0(r3) stw r6,4(r3) stw r7,8(r3) stw r8,12(r3) addi r3,r3,16 bdnz @loop blr } #endif /* Assembler for the Baby code mnemonics, with supporting functions. */ enum{ kOpShift = 13, JMP=0,JRP,LDN,STO,SUB,SU2,CMP,STP,NUM=0 }; const int xlateOP[16]={ JRP,STO,-1,-1,-1,-1,-1,CMP,LDN,-1,NUM,-1,-1,SUB,STP,JMP }; inline int GetUnsigned(char* & p) { int c,n=*p - 48; while ((c=*++p)>='0') n=n*10+c-48; return n; } inline int GetSigned(char* & p) { if (*p == '-') { return -GetUnsigned(++p); } else return GetUnsigned(p); } inline long GetOpcode(char* & p) { long OP=*((long*)(p)); p+=4; // The following hash function separates all legal // op code mnemonics into some value between 0 and 15. // The xlateOP table then translates this to the correct // value of each op code. return xlateOP[(OP ^ (OP>>8) ^ (OP>>13)) & 15]; } pascal void AssembleBabyProgram( char *program, CRT_memory memory, UInt32 address_bits) { // Reads and assembles strictly formatted ASM Baby programs // It is robust against unreasonable values, but // incorrectly formatted input will give wrong the output UInt32 memSize=1<<address_bits; if (memSize>kMaxInstructions) return; UInt32 addrMask=memSize-1; // read number of lines to follow int numLines=GetUnsigned(program); for (int i=0;i<numLines;i++) { // first item is a 4-digit decimal number // instructions can be in any order, // missing address locations are not touched int addr=addrMask & GetUnsigned(++program); // followed by a space and a 3-character op-code mnemonic int opcode=GetOpcode(program); memory[addr]=(opcode << kOpShift); if (opcode<CMP) // only op codes below CMP are followed by a number memory[addr]+=GetSigned(++program); } #if BIT_ORDER_REVERSED //flip bits in each word CopyMem(memory,memory,memSize); #else //no need to use copy #endif } /* Two very similar versions of the Execute function: the only difference is that Execute0 takes advantage of the fact when 1<<address_bits == kMaxInstructions to merge the and and shift functions into a single rlwinm opcode which requires constant parameters. */ # define nop3 nop;nop;nop # define nop4 nop3;nop # define nop5 nop4;nop # define nop6 nop5;nop # define nop7 nop6;nop # define nop8 nop7;nop # define nop9 nop8;nop # define nop10 nop9;nop # define nop11 nop10;nop # define nop12 nop11;nop # define nop13 nop12;nop # define nop14 nop13;nop # define nop15 nop14;nop // volatile registers must be named explicitely # define TEMP r0 # define MEM r3 # define addMask r4 # define PC r5 # define addShft1 r7 # define addShft2 r8 # define endPC r9 static asm void Execute(CRT_memory memory,UInt32 memUsed) { // non-volatile registers are assigned by the compiler register int* memSpan; register int* ACC; register int* BASE; register int* CI; register int* M1; register int* M2; register int* PI1; register int* PI2; register int* addr1; register int* addr2; register int* instruction; register int* CIinstField; fralloc rlwinm memSpan,addMask,2,0,29 add endPC,MEM,memSpan addi addMask,addMask,-1 b @theCode @GetCodeBase: mflr BASE li ACC,0 li CI,1 @restart: and TEMP,CI,addMask addi CI,CI,1 rlwinm PC,TEMP,2,0,29 add PC,MEM,PC lwz PI1,0(PC) lwzu PI2,4(PC) b @decode @loop: lwz PI1,4(PC) addi CI,CI,2 lwzu PI2,8(PC) @decode: rlwinm instruction,PI1,28,20,22 rlwimi instruction,PI2,25,23,25 and addr1,PI1,addMask add instruction,BASE,instruction mtctr instruction and addr2,PI2,addMask rlwinm addShft1,addr1,2,0,29 rlwinm addShft2,addr2,2,0,29 // computed jump to the selected pair of instructions bctr @theCode: bl @GetCodeBase #define doJMP1 \ lwzx CI,MEM,addShft1; \ addi CI,CI,1; \ rlwinm. CIinstField,CI,0,16,18; \ beq+ @restart; \ mr PI1,PI2; \ b @exception #define doJMP2 \ lwzx CI,MEM,addShft2; \ addi CI,CI,1; \ rlwinm. CIinstField,CI,0,16,18; \ beq+ @restart; \ b @exception #define doJRP1 \ lwzx M1,MEM,addShft1; \ add CI,M1,CI; \ rlwinm. CIinstField,CI,0,16,18; \ beq+ @restart; \ mr PI1,PI2; \ b @exception #define doJRP2 \ lwzx M2,MEM,addShft2; \ addi CI,CI,1; \ add CI,M2,CI; \ rlwinm. CIinstField,CI,0,16,18; \ beq+ @restart; \ b @exception #define doLDN(x) \ lwzx M2,MEM,addShft##x; \ neg ACC,M2 #define doLDN_STO \ lwz PI1,4(PC); \ lwzu PI2,8(PC); \ lwzx M1,MEM,addShft1; \ addi CI,CI,2; \ rlwinm instruction,PI1,28,20,22; \ rlwimi instruction,PI2,25,23,25; \ and addr1,PI1,addMask; \ add instruction,BASE,instruction; \ mtctr instruction; \ neg ACC,M1; \ and addr2,PI2,addMask; \ rlwinm addShft1,addr1,2,0,29; \ stwx ACC,MEM,addShft2; \ rlwinm addShft2,addr2,2,0,29; \ bctr #define doLDN_SUB \ lwz PI1,4(PC); \ lwzu PI2,8(PC); \ lwzx M1,MEM,addShft1; \ addi CI,CI,2; \ rlwinm instruction,PI1,28,20,22; \ rlwimi instruction,PI2,25,23,25; \ lwzx M2,MEM,addShft2; \ and addr1,PI1,addMask; \ add instruction,BASE,instruction; \ mtctr instruction; \ neg ACC,M1; \ and addr2,PI2,addMask; \ rlwinm addShft1,addr1,2,0,29; \ subf ACC,M2,ACC; \ rlwinm addShft2,addr2,2,0,29; \ bctr #define doSTO1(l) \ add TEMP,addShft1,MEM; \ cmpw PC,TEMP; \ stwx ACC,MEM,addShft1; \ bne+ @##l; \ lwz PI1,0(PC); \ lwzu PI2,4(PC); \ addi CI,CI,1; \ b @decode; \ @##l #define doSTO2 \ stwx ACC,MEM,addShft2 #define doSUB(x) \ lwzx M2,MEM,addShft##x; \ subf ACC,M2,ACC #define doSUB_STO \ lwz PI1,4(PC); \ lwzu PI2,8(PC); \ lwzx M2,MEM,addShft1; \ addi CI,CI,2; \ rlwinm instruction,PI1,28,20,22; \ rlwimi instruction,PI2,25,23,25; \ and addr1,PI1,addMask; \ add instruction,BASE,instruction; \ mtctr instruction; \ subf ACC,M2,ACC; \ and addr2,PI2,addMask; \ rlwinm addShft1,addr1,2,0,29; \ stwx ACC,MEM,addShft2; \ rlwinm addShft2,addr2,2,0,29; \ bctr #define doCMP1 \ cmpwi ACC,0; \ blt @loop #define doCMP2 \ cmpwi ACC,0; \ bge @loop; \ addi PC,PC,4; \ addi CI,CI,1 #define doSTP1 \ cmpw PC,endPC; \ ble- @exit; \ addi CI,CI,-1; \ rlwinm. CIinstField,CI,0,16,18; \ beq+ @restart; \ mr PI1,PI2; \ b @exception #define doSTP2 \ cmpw PC,endPC; \ blt- @exit; \ rlwinm. CIinstField,CI,0,16,18; \ beq+ @restart; \ b @exception // @AAA_BBB labels define the instruction pair, for readability @JMP_JMP:doJMP1;nop10 @JMP_JRP:doJMP1;nop10 @JMP_LDN:doJMP1;nop10 @JMP_STO:doJMP1;nop10 @JMP_SUB:doJMP1;nop10 @JMP_SU2:doJMP1;nop10 @JMP_CMP:doJMP1;nop10 @JMP_STP:doJMP1;nop10 @JRP_JMP:doJRP1;nop10 @JRP_JRP:doJRP1;nop10 @JRP_LDN:doJRP1;nop10 @JRP_STO:doJRP1;nop10 @JRP_SUB:doJRP1;nop10 @JRP_SU2:doJRP1;nop10 @JRP_CMP:doJRP1;nop10 @JRP_STP:doJRP1;nop10 @LDN_JMP:doLDN(1);doJMP2;nop9 @LDN_JRP:doLDN(1);doJRP2;nop8 @LDN_LDN:doLDN(2);b @loop;nop13 @LDN_STO:doLDN_STO;nop @LDN_SUB:doLDN_SUB @LDN_SU2:doLDN_SUB @LDN_CMP:doLDN(1);doCMP2;b @loop;nop9 @LDN_STP:doLDN(1);doSTP2;nop9 @STO_JMP:doSTO1(1);doJMP2;nop3 @STO_JRP:doSTO1(2);doJRP2;nop;nop @STO_LDN:doSTO1(3);doLDN(2);b @loop;nop5 @STO_STO:doSTO1(4);doSTO2;b @loop;nop6 @STO_SUB:doSTO1(5);doSUB(2);b @loop;nop5 @STO_SU2:doSTO1(6);doSUB(2);b @loop;nop5 @STO_CMP:doSTO1(7);doCMP2;b @loop;nop3 @STO_STP:doSTO1(8);doSTP2;nop3 @SUB_JMP:doSUB(1);doJMP2;nop9 @SUB_JRP:doSUB(1);doJRP2;nop8 @SUB_LDN:doLDN(2);b @loop;nop13 @SUB_STO:doSUB_STO;nop @SUB_SUB:doSUB(1);doSUB(2);b @loop;nop11 @SUB_SU2:doSUB(1);doSUB(2);b @loop;nop11 @SUB_CMP:doSUB(1);doCMP2;b @loop;nop9 @SUB_STP:doSUB(1);doSTP2;nop9 @SU2_JMP:doSUB(1);doJMP2;nop9 @SU2_JRP:doSUB(1);doJRP2;nop8 @SU2_LDN:doLDN(2);b @loop;nop13 @SU2_STO:doSUB_STO;nop @SU2_SUB:doSUB(1);doSUB(2);b @loop;nop11 @SU2_SU2:doSUB(1);doSUB(2);b @loop;nop11 @SU2_CMP:doSUB(1);doCMP2;b @loop;nop9 @SU2_STP:doSUB(1);doSTP2;nop9 @CMP_JMP:doCMP1;doJMP2;nop9 @CMP_JRP:doCMP1;doJRP2;nop8 @CMP_LDN:doCMP1;doLDN(2);b @loop;nop11 @CMP_STO:doCMP1;doSTO2;b @loop;nop12 @CMP_SUB:doCMP1;doSUB(2);b @loop;nop11 @CMP_SU2:doCMP1;doSUB(2);b @loop;nop11 @CMP_CMP:b @loop;nop15 // a form of NOP ! @CMP_STP:doCMP1;doSTP2;nop9 @STP_JMP:doSTP1;nop9 @STP_JRP:doSTP1;nop9 @STP_LDN:doSTP1;nop9 @STP_STO:doSTP1;nop9 @STP_SUB:doSTP1;nop9 @STP_SU2:doSTP1;nop9 @STP_CMP:doSTP1;nop9 @STP_STP:doSTP1; @exit: frfree blr // If an exception (CI instruction bit set) was detected, // execution continues here, with a loop which checks CI // on every instruction. Not much optimized since we don't // expect to run in this mode very much @exception: //reconstruct CI: addi CI,CI,-1 b @theCode2 @getCodeBase2: mflr BASE @L1: // test CI bits 13-15 before incrementing CI // this appears to be what M1SIM.EXE does rlwinm. CIinstField,CI,0,16,18 addi CI,CI,1 and TEMP,CI,addMask rlwinm TEMP,TEMP,2,0,29 lwzx M1,MEM,TEMP //load instr word from memory beq- @L2 cmpwi CIinstField,0x2000 bne- @exit // return if CI instruction != 1 add PI1,PI1,M1 // add instruction to instruction !! b @L3 @L2: mr PI1,M1 // the normal case : PI = CW @L3: rlwinm instruction,PI1,23,25,27 // instruction * 16 add instruction,BASE,instruction mtctr instruction and addr1,PI1,addMask rlwinm addShft1,addr1,2,0,29 bctr @theCode2: bl @getCodeBase2 @JMP: lwzx CI,MEM,addShft1 b @L1; nop;nop @JRP: lwzx TEMP,MEM,addShft1 add CI,TEMP,CI b @L1; nop @LDN: lwzx TEMP,MEM,addShft1 neg ACC,TEMP b @L1; nop @STO: stwx ACC,MEM,addShft1 b @L1; nop;nop @SUB: lwzx TEMP,MEM,addShft1 subf ACC,TEMP,ACC b @L1; nop @SUB2: lwzx TEMP,MEM,addShft1 subf ACC,TEMP,ACC b @L1; nop @CMP: cmpwi ACC,0 bge @L1 addi CI,CI,1 b @L1 @STP: b @exit } # define KM kMaxInstructions // rMask provides the correct constant for the rlwinm assembly instruction # define rMask ( \ ((KM & 32) >> 5)*25+ \ ((KM & 64) >> 6)*24+ \ ((KM & 128) >> 7)*23+ \ ((KM & 256) >> 8)*22+ \ ((KM & 512) >> 9)*21+ \ ((KM &1024) >>10)*20 \ ) static asm void Execute0(CRT_memory memory,UInt32 memUsed) { // This version of Execute relies on kMaxInstructions to // equal memUsed (= 1 << address_bits) register int* memSpan; register int* ACC; register int* BASE; register int* CI; register int* M1; register int* M2; register int* PI1; register int* PI2; register int* instruction; register int* CIinstField; fralloc rlwinm memSpan,addMask,2,0,29 add endPC,MEM,memSpan b @theCode @GetCodeBase: mflr BASE li ACC,0 li CI,1 @restart: rlwinm PC,CI,2,rMask,29 addi CI,CI,1 add PC,MEM,PC lwz PI1,0(PC) lwzu PI2,4(PC) b @decode @loop: lwz PI1,4(PC) addi CI,CI,2 lwzu PI2,8(PC) @decode: rlwinm instruction,PI1,28,20,22 rlwimi instruction,PI2,25,23,25 add instruction,BASE,instruction mtctr instruction rlwinm addShft1,PI1,2,rMask,29 rlwinm addShft2,PI2,2,rMask,29 bctr @theCode: bl @GetCodeBase #undef doLDN_STO #define doLDN_STO \ lwz PI1,4(PC); \ lwzu PI2,8(PC); \ lwzx M1,MEM,addShft1; \ addi CI,CI,2; \ rlwinm instruction,PI1,28,20,22; \ rlwimi instruction,PI2,25,23,25; \ add instruction,BASE,instruction; \ mtctr instruction; \ neg ACC,M1; \ rlwinm addShft1,PI1,2,rMask,29; \ stwx ACC,MEM,addShft2; \ rlwinm addShft2,PI2,2,rMask,29; \ bctr #undef doLDN_SUB #define doLDN_SUB \ lwz PI1,4(PC); \ lwzu PI2,8(PC); \ lwzx M1,MEM,addShft1; \ addi CI,CI,2; \ rlwinm instruction,PI1,28,20,22; \ rlwimi instruction,PI2,25,23,25; \ lwzx M2,MEM,addShft2; \ add instruction,BASE,instruction; \ mtctr instruction; \ neg ACC,M1; \ rlwinm addShft1,PI1,2,rMask,29; \ subf ACC,M2,ACC; \ rlwinm addShft2,PI2,2,rMask,29; \ bctr #undef doSUB_STO #define doSUB_STO \ lwz PI1,4(PC); \ lwzu PI2,8(PC); \ lwzx M2,MEM,addShft1; \ addi CI,CI,2; \ rlwinm instruction,PI1,28,20,22; \ rlwimi instruction,PI2,25,23,25; \ add instruction,BASE,instruction; \ mtctr instruction; \ subf ACC,M2,ACC; \ rlwinm addShft1,PI1,2,rMask,29; \ stwx ACC,MEM,addShft2; \ rlwinm addShft2,PI2,2,rMask,29; \ bctr @JMP_JMP:doJMP1;nop10 @JMP_JRP:doJMP1;nop10 @JMP_LDN:doJMP1;nop10 @JMP_STO:doJMP1;nop10 @JMP_SUB:doJMP1;nop10 @JMP_SU2:doJMP1;nop10 @JMP_CMP:doJMP1;nop10 @JMP_STP:doJMP1;nop10 @JRP_JMP:doJRP1;nop10 @JRP_JRP:doJRP1;nop10 @JRP_LDN:doJRP1;nop10 @JRP_STO:doJRP1;nop10 @JRP_SUB:doJRP1;nop10 @JRP_SU2:doJRP1;nop10 @JRP_CMP:doJRP1;nop10 @JRP_STP:doJRP1;nop10 @LDN_JMP:doLDN(1);doJMP2;nop9 @LDN_JRP:doLDN(1);doJRP2;nop8 @LDN_LDN:doLDN(2);b @loop;nop13 @LDN_STO:doLDN_STO;nop3 @LDN_SUB:doLDN_SUB;nop;nop @LDN_SU2:doLDN_SUB;nop;nop @LDN_CMP:doLDN(1);doCMP2;b @loop;nop9 @LDN_STP:doLDN(1);doSTP2;nop9 @STO_JMP:doSTO1(1);doJMP2;nop3 @STO_JRP:doSTO1(2);doJRP2;nop;nop @STO_LDN:doSTO1(3);doLDN(2);b @loop;nop5 @STO_STO:doSTO1(4);doSTO2;b @loop;nop6 @STO_SUB:doSTO1(5);doSUB(2);b @loop;nop5 @STO_SU2:doSTO1(6);doSUB(2);b @loop;nop5 @STO_CMP:doSTO1(7);doCMP2;b @loop;nop3 @STO_STP:doSTO1(8);doSTP2;nop3 @SUB_JMP:doSUB(1);doJMP2;nop9 @SUB_JRP:doSUB(1);doJRP2;nop8 @SUB_LDN:doLDN(2);b @loop;nop13 @SUB_STO:doSUB_STO;nop3 @SUB_SUB:doSUB(1);doSUB(2);b @loop;nop11 @SUB_SU2:doSUB(1);doSUB(2);b @loop;nop11 @SUB_CMP:doSUB(1);doCMP2;b @loop;nop9 @SUB_STP:doSUB(1);doSTP2;nop9 @SU2_JMP:doSUB(1);doJMP2;nop9 @SU2_JRP:doSUB(1);doJRP2;nop8 @SU2_LDN:doLDN(2);b @loop;nop13 @SU2_STO:doSUB_STO;nop3 @SU2_SUB:doSUB(1);doSUB(2);b @loop;nop11 @SU2_SU2:doSUB(1);doSUB(2);b @loop;nop11 @SU2_CMP:doSUB(1);doCMP2;b @loop;nop9 @SU2_STP:doSUB(1);doSTP2;nop9 @CMP_JMP:doCMP1;doJMP2;nop9 @CMP_JRP:doCMP1;doJRP2;nop8 @CMP_LDN:doCMP1;doLDN(2);b @loop;nop11 @CMP_STO:doCMP1;doSTO2;b @loop;nop12 @CMP_SUB:doCMP1;doSUB(2);b @loop;nop11 @CMP_SU2:doCMP1;doSUB(2);b @loop;nop11 @CMP_CMP:b @loop;nop15 @CMP_STP:doCMP1;doSTP2;nop9 @STP_JMP:doSTP1;nop9 @STP_JRP:doSTP1;nop9 @STP_LDN:doSTP1;nop9 @STP_STO:doSTP1;nop9 @STP_SUB:doSTP1;nop9 @STP_SU2:doSTP1;nop9 @STP_CMP:doSTP1;nop9 @STP_STP:doSTP1; @exit: frfree blr // If an exception (CI instruction bit set) was detected, // execution continues here, with a loop which checks CI // on every instruction. @exception: //reconstruct CI: addi CI,CI,-1 b @theCode2 @getCodeBase2: mflr BASE @L1: // test CI bits 13-15 before incrementing CI // this appears to be what M1SIM.EXE does rlwinm. CIinstField,CI,0,16,18 addi CI,CI,1 rlwinm TEMP,CI,2,rMask,29 lwzx M1,MEM,TEMP //load instr word from memory beq- @L2 cmpwi CIinstField,0x2000 bne- @exit // return if CI instruction != 1 add PI1,PI1,M1 // add instruction to instruction !! b @L3 @L2: mr PI1,M1 // the normal case : PI = M1 @L3: rlwinm instruction,PI1,23,25,27 // instruction * 16 add instruction,BASE,instruction mtctr instruction rlwinm addShft1,PI1,2,rMask,29 bctr @theCode2: bl @getCodeBase2 @JMP: lwzx CI,MEM,addShft1 b @L1; nop;nop @JRP: lwzx TEMP,MEM,addShft1 add CI,TEMP,CI b @L1; nop @LDN: lwzx TEMP,MEM,addShft1 neg ACC,TEMP b @L1; nop @STO: stwx ACC,MEM,addShft1 b @L1; nop;nop @SUB: lwzx TEMP,MEM,addShft1 subf ACC,TEMP,ACC b @L1; nop @SUB2: lwzx TEMP,MEM,addShft1 subf ACC,TEMP,ACC b @L1; nop @CMP: cmpwi ACC,0 bge @L1 addi CI,CI,1 b @L1 @STP: b @exit } pascal void ExecuteBabyProgram( CRT_memory memory, UInt32 address_bits) { UInt32 memUsed=1<<address_bits; if (memUsed>kMaxInstructions) return; / copy of memory allocated on procedure stack UInt32 memCopy[kMaxInstructions+2]; CopyMem(memCopy,memory,memUsed); // 2 * STP as sentinel: memCopy[memUsed]=-1; memCopy[memUsed+1]=-1; if (memUsed==kMaxInstructions) Execute0(memCopy,memUsed); else Execute(memCopy,memUsed); // return processed memory to caller CopyMem(memory,memCopy,memUsed); }
- SPREAD THE WORD:
- Slashdot
- Digg
- Del.icio.us
- Newsvine