Due to the discussion on the software emulation of other systems on the
MEGADEV mailing list, I am uploading this comp.sys.amiga.emulations
discussion which occured in 1991.  It does contain quite a bit of useful
information and ideas for anyone interested in producing their own
emulators.  It specifically concentrates on the 6502 and Z-80, although
the concepts are generally applicable for other systems as well.

						Ian.

>From: jnmoyne@lbl.gov (Jean-Noel MOYNE)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re: Emulator Mechanics (sorry long post)
>Message-ID: <10652@dog.ee.lbl.gov>
>Date: 6 Mar 91 03:03:52 GMT
>Organization: Lawrence Berkeley Laboratory
>References:<4992@mindlink.UUCP> <1991Mar6.010141.5905@mintaka.lcs.mit.edu>
>
>
>            Yes, insteresting indeed. I've _heard_ (i.e. don't quote me on 
>that) that once a company did such a "application-translator", on unix. 
>Where you could take an Ms/Dos exe file, and the program would generate a 
>unix C source code, maping all the Ms/Dos int traps into legal unix os 
>calls. I've heard this program was able to translate well-known commercial 
>programs, and that these programs were able to run after that under normal 
>unix multi-user configuration, the end of the story is that the people 
>that made this soft could never even try to sell it (guess why).
>
>        Anyway, doing such a thing wouldn't be impossible technically (but 
>it'd be hard for sure ! (-:), especially for ms/dos programs. All these 
>big softs are made to run on all the different flavors of IBM clones, that 
>means they use no dirty tricks, they stick to BIOS calls so it'll run even 
>on an Amstrad (or under the transformer or IBeM for instance). That means 
>you could quite easily put all the BIOS ints in an Amiga shared lib, and 
>taking the 8088 binary, generate C code from it. Programs a'-la 
>"re-sourcer" allready exist, they take the .exe file, and generate a 8088 
>source code file from it, at this point it's possible to generate a C 
>source code that would do the same thing. You could optimise it to 
>recognise patterns in the ASM instructions (don't forget most of these big 
>programs, the one you're interested in to run on the Amiga, are usually 
>written in C or some other high-level language, and so the binary is 
>generated by a compiler, thus creating patterns ...), and eventually build 
>a C source code not to far from the original (the guys who wrote Microsoft 
>C could eventually write a reverse compiler that would generate a code 
>very close to the original, and such a program wouldn't be machine 
>dependent at all).
>
>         An other problem who be to catch the busy wait loops in these 
>mono-tasking programs (I think most of these loops are in the BIOS, like 
>waiting for a key, easy to take care of). So these programs would be 
>multitasking friendly, and so that you could eventually run more than one 
>at the same time without loosing performance.
>
>        Anyways, all this is my mind wandering sideways (I've thought 
>about that before), and most of what I wrote is closer to dreams than to 
>reality (and in bad English, sorry about that, still learning (-:). Mostly 
>because, if it's not impossible to write such a program, it for sure very 
>hard ! And so, it'll take a lot of time and effort, more than what a 
>standalone guy doing that on the side of his regular job could invest. A 
>team in a company could do it, but no company's gonna take such a risk, 
>there's too much potential danger on that, even if the potential sales are 
>very big ! Sure one could say that if you buy the original program, you 
>should be able to run it ... even on another machine, but I'm sure the 
>army of lawyers from let's say Microsoft or Aston-Tate would have 
>something to say about that ! And they could sue you for years without 
>sweat, preventing you from selling the program before the end of the 
>trial... etc ... etc ... 
>
>         Eventually, if the program was freeware, written by a bunch of 
>programers not earning money for it, this would be a different figure, 
>much different. But I'm really no lawyer, I'm still thinking sideways .... 
>(-:
>
>
>            JNM
>
>--
>These are my own ideas (not LBL's)
>
>
>From: Chris_Johnsen@mindlink.UUCP (Chris Johnsen)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Emulator Mechanics (sorry long post)
>Message-ID: <4992@mindlink.UUCP>
>Date: 4 Mar 91 15:20:48 GMT
>Organization: MIND LINK! - British Columbia, Canada
>
>
> The following messages were shared in an email environment between Charlie
>Gibbs and myself over the last couple of days.  We decided that it would be a
>good idea to share these thoughts with the readership of
>comp.sys.amiga.emulations.  The messages speak for themselves for the most
>part.
>
>         ------------ Included text of messages ------------
>
>Sun Mar 3 6:40:53 1991  From: Chris Johnsen [547]  Subject : Emulators
>
>Hello Charlie!  Long time no talk to you eh?  I haven't seen you for quite some
>time.  I was particularly fond of those after-PaNorAmA meetings at Wendy's a
>few years ago.  I trust you recall me from back then.
>
>I have been following the comp.sys.amiga.emulations newsgroup on Usenet for
>quite awhile now and was quite interested in the IBeM program that is being
>posted about recently.  In one of the near Alpha/Theta states that, being a
>programmer,  such as yourself, one recognizes as some kind of "source"; I had
>an idea. Since there are not many "new" ideas spawned I wanted to get some
>feedback from a person with the expertise you possess.  The reasons I thought
>of you for this private consultation are:  1) you've written an emulator  -
>SimCPM 2) you've written an assembler - A68K  3) you have a broad knowledge  of
>the computing field beyond lowly PC's and  4) you're such a nice guy!
>
>I was musing about how an emulator would work.  I must confess I have no
>concrete knowledge of this.  It has been said that "Not knowing how to do a job
>'right' frees you to discover new and better ways of accomplishing the desired
>result".  Freeman Patterson (a Canadian photographer), in his book Photography
>and the Art of Seeing, called this "thinking sideways".  I  believe I've
>thought of a way of writing an emulator that would run at least  as fast on the
>Amiga as it would on the source machine.
>
>My questions for you are:
>
>Would you be interested in being a mentor to me in developing further this idea
>or in fact ascertaining whether my concept has any validity?
>
>I'm reticent to spill this on the net until I'm somewhat confident that it is
>indeed practicable.  I have thought about the possible legal considerations of
>producing an emulator.  There are potential commercial possibilities that one
>could consider also.
>
>Would you please give me just a quick, superficial rundown of the basic
>algorithms used in developing an emulator?
>
>I have assumed that you would read in the object module of say an IBM
>executable, read it by opcode or routine, decipher the intent and then call a
>library of glue routines to do the job that the program would have on an  IBM
>or clone.  I have no idea how the interrupt structure would be handled but know
>that you have done it with SimCPM.
>
>I don't want to waste your time, but I would appreciate this information. I
>would like to get your feedback on these questions to see if you are interested
>in further discussion with me on this matter.  If you are not interested or
>don't have the time I'll certainly understand.  I realize I'm asking you for
>more information than I'm sharing but an idea is a tiny property and I'd like
>to at least savor this for a while before I decide what to do with it.  That is
>the sole reason I have not gone public with it yet.
>
>If you are indeed interested, after you tell me on the highest level how an
>emulator functions, I'll be able to describe this idea on some kind of
>comprehensible basis. I'm looking forward to your response!  Thanks Charlie.
>
>csj
>
>Sun Mar 3 23:12:32 1991  From: Charlie Gibbs [218] Subject : Emulators
>
>     Indeed I am somewhat short of time these days, but I wouldn't mind kicking
>the odd idea around without getting too wrapped up in it.  I do understand your
>idea of "thinking sideways" and enjoy being able to do it myself from time to
>time.  A similar way I've heard it described is that even though it's been
>proven that bumblebees can't fly, they don't realize this and so do it anyway.
>Or to put it another way, I like to write programs that are too stupid to know
>what they're doing, so they can do anything.
>
>     Your ideas on emulation are basically in line with what's considered the
>standard way of doing things.  A machine instruction is analyzed as to just
>what it's supposed to do, and appropriate code then carries out the operation.
>The guts of SimCPM appeared in Dr. Dobb's journal as an emulator that was meant
>to run under CP/M-68K.  This made the job a bit easier for the original author,
>since the CP/M-68K system calls were quite similar to the CP/M calls that he
>was emulating.  I had to replace this portion of code with appropriate AmigaDOS
>routines.  In addition, I extended the code to handle the full Z-80 instruction
>set, since the original code could only handle the 8080 subset.
>
>     Since emulating another processor in software is quite a CPU-intensive
>process (several machine instructions have to be executed to emulate a single
>machine instruction on the target machine) I tried to optimize SimCPM for speed
>at the expense of memory and redundant code.  The overhead of a single
>subroutine call, plus any extraction and interpretation of arguments, would
>require several times as much time as a hand-picked set of instructions
>dedicated to a single opcode.
>
>     For system calls there's easy out - as soon as I recognize what the
>emulated program is trying to do (e.g. read a block from a disk file), I call
>the corresponding AmigaDOS routine, so I/O can proceed pretty well at native
>speeds.  Therefore, even though CPU-bound programs might run 10% as fast as
>they would on the target machine, I/O bound programs might get up to 50% of
>speed.
>
>     Interrupts were easy - since most CP/M systems don't use hardware
>interrupts, except possibly for a very few hardware-dependent programs, I
>simply didn't worry about them.  Software interrupts (the RST instruction) were
>a snap on the other hand, since they're basically a special-purpose subroutine
>call.
>
>     The Intel 8080 is fairly easy to emulate because the opcode is uniquely
>determined by the first byte of the instruction.  Some instructions might have
>register numbers encoded in a few bits of that first byte, but I just treat
>them as special cases.  To decode the byte, I just multiply its value by 4
>(shift left 2 bits) and use the result as an index into a table of 256 pointers
>to the actual emulation routines.  Since there are 64 possible MOV instructions
>(well, 63 because one of the bit combinations is actually HLT) I actually have
>63 MOV emulations, one for each combination of registers.  This means that I
>don't have to do any register decoding, since each routine consists of a
>dedicated 68000 MOVE instruction, followed by a jump back to the main emulation
>loop.  Lots of almost-redundant code, but it's about as fast as it can get.
>
>     This is getting kind of long-winded.  I'd be interested in hearing any
>ideas you may have; although I wouldn't have time to get into the programming
>of such stuff, I'm willing to act as a sounding board.  Talk to you soon... CJG
>
>Sun Mar 3 23:18:48 1991  From: Charlie Gibbs [218]  Subject : Emulators
>
>     Another approach I've heard of is to "compile" the code to be emulated
>into native machine code.  This would involve a front-end program which would
>read the target machine's program and analyze the instructions.  For instance,
>if the "compiler" detects an instruction that does a move between two of the
>emulated machine's registers, it would simply generate a move instruction in
>the emulating machine's code.  It could generate either a translated assembly
>language source file or a machine-language file ready to load into the
>emulating machine.  This would require the "compilation" process to be run once
>on the program to be emulated, and you'd then run the output of this
>"compiler."  There are special tricks to consider here, such as resolving
>addresses - you couldn't just copy the memory addresses across because the
>emulated routines would likely be a different size.  It might be easier to
>generate a label (e.g. Axxxx where xxxx is the hex address in question) in an
>assembly source file and let the emulating machine's assembler sort it all out.
>
>     I've never actually seen this process in action, but it's another
>possibility.  --CJG
>
>Mon Mar 4 12:38:31 1991  From: Chris Johnsen [547]  Subject : Emulators
>
>Thanks a lot for your effort in explaining SimCPM to me man.  As you  describe
>it, it would seem that I had intuitively understood the basic concepts.  I
>would think that interrupts would be the hardest part to get down to reliable
>operation.  What I had in mind, while thinking about  this, in general terms,
>was an emulator that was non-specific as to the  machine, therefore I was
>attempting to contemplate it handling say IBM,  Mac, (hey it may even be
>possible to deal with Amiga emulation!) and  Atari ST on the Amiga and
>imagining what the various architectures would  require.  All this on a very
>abstract level.
>
>Your second message hit the nail on the head!  I got bogged down at about the
>level you describe in your first message.  Lots of details to be sought and
>worked out.  Gee, I'm really not even available to code another program just
>yet anyway.  I was giving the whole concept a rest when, what I  thought of,
>kind of sideways (lazy minds tend to look for an easier way  around an
>obstacle, sometimes unconsciously, even though this can lead to  harder, though
>more elegant solutions to problems), was to read the opcodes from the "source
>executable" of the emulated machine, producing an assembly listing of the
>program.  This I imagine would be a two pass process, sort of like a C
>compiler, followed by an assembler's two passes, and finished off with a
>linker.
>
>I thought that, if the compiler was "intelligent" enough, the output, though
>likely larger, would be much faster than the common "interpreter type
>emulator".  I had never heard of such an idea and since there are none out
>there, wanted to discuss this with you.  I have developed the idea no further
>than this in essence.
>
>I did think of a few other considerations however.  If one could, indeed,
>compile an executable image of say Lotus 123 from the IBM into a program which,
>on a base Amiga, could run at half speed, or on a A2500 or A3000 at twice the
>speed, it would be a viable alternative, besides being a neat toy.  However,
>the standalone program generated would likely infringe on the copyright of
>Lotus because the Amiga executable would actually be Lotus 123.  Take
>WordPerfect for instance.  The latest version available is 4.1 or just a
>micro-point higher, no problem, get hold of the IBM version 5.1, I believe it
>is, and compile it and you have something some other people are wailing for. Of
>course the rebuttal (I can hear you thinking?) is that, if a person owns
>WordPerfect he has an inalienable right to run it. Run it on an IBM.  Run it on
>a clone.  Run it through an Amiga compiler.  You know, if it's for personal
>use, etc.
>
>As to the increase in size of the "compiled emulation" program, I have a couple
>of ideas.  First, the executable, though larger, would be standalone, except
>for any support libraries.  This doesn't mean that this "form" of emulator,
>more like a "translator of executables", would be any less efficient than the
>"interpreter type".  Perhaps more memory efficient in a couple of possible
>ways.  Since the interpretation section of the program is in the compiler, and
>the source executable is not required at runtime, memory usage may well be less
>with a "compiled emulation".  The second concept is to use link libraries which
>would bind only the emulation routines required to the final program.  Possibly
>a combination of bound-at-link-time modules of less frequently used routines
>and a shared  library of essential routines all programs would need.  A solely
>link library approach would leave this concept open to claims that pirates
>could produce "warez" that need no extra code or setup to work.  Of course,
>pirates appear to be capable of ripping anything off anyway!
>
>This "compiled emulation" would, given sufficient memory and CPU
>speed/efficiency, allow the running of multiple programs.  Both emulated
>programs and standard Amiga programs.  Through the use of a shared library more
>than one emulating program could be run without the overhead of multiple
>"emulation interpreters" resident in memory.
>
>The compiler could generate C statements so that you could take advantage  of
>the advancements in technology in the compiler, assembler and linker,  without
>having to deal, directly, with those parts of the system.  I know  this would
>make the compiler operation more unwieldy.  More operations,  therefore it
>would take longer, but theoretically the source is bugless,  so you would
>expect the output of this "emulation compiler" to either  succeed or fail.
>You'd run the emulator on the program only once.  The  beauty of producing
>assembler (C would be better here), is that if it  didn't work first time, a
>programmer type could patch it up in source and get it running.  I'm really
>intrigued by this idea.  Where did you hear about it, do you remember?
>
>My knee-jerk reaction initially was to file the idea, but then I got to
>wondering why no one had done it.  There were many emulators out there for
>various source machines.  Why were none of them compilers?  Another idea I had
>was to contact the dude in Oz (or is he a Kiwi?) that wrote IBeM.  He already
>has the emulation working except for the parallel and serial ports. It would
>appear that he reads the IBM object code, deciphers it and runs a routine, or
>simply does a MOVE using an opcode lookup table, as you suggest; an
>interpreter.  If he instead simply wrote out an instruction in ASCII to do the
>call or move instead, using a shared library of his emulation routines, he'd
>basically have it.  The end user would also have to have an assembler or C
>compiler, however.  This type of approach has got to produce faster emulation,
>if it is possible.  I believe it to be.
>
>Anyway, that's what I had in mind Charlie.  I really do appreciate your
>feedback on this.  Care to comment on any directions you think could be
>followed?  Do you know anyone with enough venture capital to fund the further
>development of this concept? ;-)  Do you think I should approach the author of
>IBeM (cute name) directly?  Or, should this private discussion we've been
>having be moved to Usenet?  Thanks again Charlie!   I appreciate you man.
>
>csj
>
>Mon Mar 4 15:36:53 1991  From: Charlie Gibbs [218] Subject : Emulators
>
>     I can't remember where I first heard of the idea.  The converted code
>won't necessarily be smaller than the original, depending on the relative sizes
>of corresponding machine instructions on both machines.  However, if you could
>make the compiler really smart it might be able to recognize certain sequences
>of instructions and replace them by sequences designed to accomplish the same
>thing more efficiently.  For instance, since the 8080 doesn't have a multiply
>instruction it needs to fake it with a bunch of adds and shifts.  A smart
>compiler, if it could recognize such a routine, could replace it with a single
>68000 multiply instruction and see huge savings.
>
>     I'd stay away from calling subroutines; the overhead could kill you.
>
>     The copyright issue could be a sticky one, although I can't see any
>problems if you run the converter on your own copy of the emulated software and
>don't try to sell the result.  It would no doubt be classified as a "derivative
>work".
>
>     Perhaps it might be interesting to throw this discussion out to Usenet. It
>won't be a trivial job, which is probably why we haven't seen it done
>elsewhere.  Remember that a straight machine-code emulation duplicates all the
>register fiddling that is required by the target machine's architecture (and
>the 80x86 family needs a LOT of register fiddling).  This code is replaced by
>the 680x0's own internal fiddling if you're re-compiling source code.  One way
>of looking at it is to decompile the original machine code, then recompile it
>for the new machine.
>
>     Interesting stuff...  CJG
>
>     ------------ End of included text of messages. -----------
>
>Both Charlie Gibbs and myself frequent this newsgroup and look forward to any
>additions to this discussion with which others may respond.  Sorry that  the
>posting is so long but I felt there was little enough chaff contained  in the
>messages to warrant including all of them.
>
>csj
>
>The hard way is usually the disguised easy way, you take your choice. Usenet:
>a542@mindlink.UUCP Phone: (604)853-5426 FAX: (604)854-8104
>
>
>From: rjc@pogo.ai.mit.edu (Ray Cromwell)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re: Emulator Mechanics (sorry long post)
>Message-ID: <1991Mar6.010141.5905@mintaka.lcs.mit.edu>
>Date: 6 Mar 91 01:01:41 GMT
>References: <4992@mindlink.UUCP>
>Organization: None
>
>
>  Very interesting article. I myself have been tempted several times
>to try and write an emulator. Since I programmed 6502 assembly on the
>C64 for 4 years, and I know 68000 on the Amiga, I was tempted to
>try to beat the speed of the other emulators. Then I realized
>the sheer magnitude of the project. Emulating the instruction set is
>easy. In fact, I am quite confident I can make a 6502 emulator run
>faster on the Amiga then the C64. The hard part is the hardware.
>Most C64 programs discard the oS entirely and bang on the hardware.
>Further more, most of them use polling loops, like polling the 
>raster beam register and using precisely cycle timed delays. Moreover,
>the VIC chip contains several glitches that allow programmers to
>use tricks to remove the borders, vertically scroll the screen to
>ANY raster location, horizontally shift the screen, vertically and
>horizonally interlace the screen, stretch pixels (double, triple,
>quaduple) length vertically. This is virtually IMPOSSIBLE to
>detect, unless the emulator is artifically inteligent.  And, any
>program that has a fastloader won't work. This is because fastloaders
>usually transfer data over the serial clock line, and data line. This
>doubles bandwidth, unforunately it requires PERFECT timing, so perfect
>in fact that it won't work on PAL computers, and vice versa.
>Sprites are another problem, since Amiga sprites are only 16 pixels wide, and
>C64 sprites can have their width and heigth doubled, and they rely
>on a chunky pixel format.  Text is another problem since the
>C64 has a builtin Text mode.
>
>   The Mac is the easiest computer to emulate because it's not a computer
>at all. The Macintosh computer does not exist, it's nothing more
>than a ROM chip.
>
>A few days ago, I was impressed. I downloaded a demo from ab20 called
>C64Music.zap. This demo emulates 6502 at 100% (in fact, it emulates it
>at perfect timing because the music is exactly the same speed.) This
>demo emulates the SID chip PERFECTLY, and I mean perfect. These guys
>should join together with the maker of A64.
>
>
> I can't speak for other 6502 emulators, but if I wrote one, the fastest
>method looks like table lookup, with careful optimization to make
>sure things are long word aligned. For instance, I might do something like
>
>pc6502 equr a5
>accum  equr d5
>xindex equr d6
>yindex equr d7
>stack  equr a4 ; 6502 stack, which is base address + $0100 on the C64
>stat   equr d4 ;status register
>
>
>
>              [allocmem the 6502's address space, load in an
>	        executable and set the pc to it's start]
>
>       lea jumptbl(pc),a2
>       sub.l d0,d0
>loop   move.b (pc6502)+,d0
>       lsl.l #2,d0
>       move.l (a2,d0),a3
>       jmp (a3)
>
>
>then every instruction would be emulated (even undoc's) and put into
>the jumptbl. The code for 'LDA address' might look like:
>
>lda    sub.l d0,d0
>       move.b (pc6502)+,d0
>       lsl.l #8,d0
>       or.b (pc6502)+,d0
>       add.l addresspace,d0 ;this code inverts the 6502 little-endian
>	                    ; and then add's the base address of the
>		            ; memory that was alloc's for it
>       move.l d0,a3
>       move.b (a3),accum
>       jsr GetCC() ?  ;It might be better to use move SR, providing you
>	              ; check the machine you were running on and did
>                      ; and did a move ccr otherwise
>       		      ; status reg is now in d0
>       and #mask,d0   ; mask off everything but the Z bit
>       bne whatever
>whatever bclr #1,stat
>       jmp loop
>       bset #1,stat
>       jmp loop
>
>(note: this code can be optimized, its off the tip of my tongue, and
>probably bugged since I haven't coded in asm in awhile)
>
>>From my quick calculations, the jump table dispatcher incurs about a 3-4
>microsecond delay in the fetch of each instruction. This is equivelent
>to about 4 cycles on a 6502 @1.02mhz. If you had infinite amounts of ram,
>the object code loader could 'inline' the code for each instruction
>and get rid of this delay, I beleive this is probably how the C64Music
>demo does it, since music players on the C64 were only about 1k of code.
>
>The Lda routine itself looks like about 2.2 times slower than a true 6502
>delay which 4 cycles. However a 25mhz 68030 would run more than 
>twice as fast.
>
>
>Theoretically speaking, an IBM emulator running on an Amiga3000
>should be running at atleast 5mhz 80286 speed. Consider SoftPC on
>the NeXT which runs at 12mhz UNDER UNIX. 68040's are about 2-3 times
>faster then 68030's, so SoftPC on the Amy should run at about 5mhz.
>
>
> Maybe we all should trying something like 'NetIBM'. What I mean, is
>like Nethack, we should all participate in coding an IBM emulator.
>Each person might post a small code segment (in assembler), the rest
>of us can compete optimize it. I remember having a contest with
>some trying to optimize the 'size' of a binary to decimal print
>routine, the final result was the code was reduced 300% in size.
>(we kept passing the optimized source back and forth, each shedding
>a few bytes.)
>
>
>Regardless of what happens. Let's keep the discussion up, it's interesting
>and educational.
>
>
>       
>
>
>From: ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re: Emulator Mechanics (sorry long post)
>Message-ID: <1303@macuni.mqcc.mq.oz>
>Date: 6 Mar 91 12:57:34 GMT
>References: <4992@mindlink.UUCP> <1991Mar6.010141.5905@mintaka.lcs.mit.edu>
>Organization: Macquarie University, Sydney, Australia.
>
>In article <1991Mar6.010141.5905@mintaka.lcs.mit.edu> rjc@pogo.ai.mit.edu (Ray Cromwell) writes:
>>   The Mac is the easiest computer to emulate because it's not a computer
>>at all. The Macintosh computer does not exist, it's nothing more
>>than a ROM chip.
>
>The best description of a Mac I've ever read! :-)
>
>> I can't speak for other 6502 emulators, but if I wrote one, the fastest
>>method looks like table lookup, with careful optimization to make
>>sure things are long word aligned. For instance, I might do something like
>
>How about this?
>
>Write code for every one of the 256 possible opcodes, and make sure that
>it follows these important considerations:
>
>1)  The code is responsible for updating all registers, the 6502PC,
>    and also for checking special functions if a hardware location
>    is called.
>2)  It should be 256 bytes per opcode or less.
>3)  It should jump back to STLP (see below) when done.
>4)  It should intelligently introduce timing delays (very hard)
>5)  Something else I've forgotten.
>
>Take each piece of code, and distribute it in a 64K memory area, so that
>the code for opcode N starts at base + (n << 8).  There may be a lot of
>wasted space, but on any reasonable Amiga, this space can be sacrificed
>for speed IMO.
>
>	; Base reg of instruction code in A0.  6502PC in A1.
>
>STLP	MOVE.B	(A1), lcna	; lcn is the upper byte of the
>				; dsp displacement below
>	JMP	dsp (A0)
>
>
>Excuse my poor 68K code.  I think this will work, but it has been three
>years since I even looked at assembler on the 68000.  Before anyone
>screams about this code failing on a 68030, remember that the
>modification made above is only needed until the JMP.  After that, we do
>not need to see it again.  If I read the info on the 68030 cache
>properly, this should be okay.  At worst, the instruction cache can be
>disabled.
>
>This should be considerably faster than any table driven system.  On any
>system with multi-byte opcodes (eg. the 8088), this system can be easily
>extended, and minor modifications can be made to emulate most modern CPU
>with reasonable speed.
>
>Now a suggestion for dealing with hardware locations (harder on memory
>mapped systems than on things like Intel and Zilog chips which have I/O
>instructions.)  This is another memory-expensive solution, BTW.
>
>For every location, have two flags, one for read and one for write.  If
>the opcode handlers write to the location, they can check the write flag
>and if it is set, call a handler which determines which location is
>being modified, and pass control to a routine to do it.
>
>Now for the VERY memory expensive solution.
>
>Have one 16 bit displacements per memory location (ie. for 1M of
>simulated RAM you must have 3M of real RAM.)  This location contains the
>displacement from a base register to a routine which handles that memory
>location.  If this word is non-zero, an instruction that reads or write
>to it calls this routine.  You could even have two displacements, one
>for read and one for write, but this would only be practical for systems
>that have small amounts of simulated memory (eg. C64).
>
>At a rough estimate, for a C64 emulator (IMO the hardest of all
>emulators.)
>
>	Simulated Memory	 64K
>	Read Displacements	128K
>	Write Displacements	128K
>	Opcode handlers		 64K
>	Memory Location Handlrs	 28K (?)
>	Rest			100K (??)
>				----
>				512K
>
>So, for an average 1M Amiga, this is quite possible.
>
>As for the project, why not start with something much simpler?  An Atari
>2600 emulator.  The hardware is quite simple to simulate, though the
>timing will have to be *very* precise.  This would be a good starting
>point.  Opinions, anyone?
>
>--
>Ian Farquhar                      Phone : + 61 2 805-9400
>Office of Computing Services      Fax   : + 61 2 805-7433
>Macquarie University  NSW  2109   Also  : + 61 2 805-7420
>Australia                         EMail : ifarqhar@suna.mqcc.mq.oz.au
>
>
>From: ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re: Emulator Mechanics (sorry long post)
>Message-ID: <1304@macuni.mqcc.mq.oz>
>Date: 6 Mar 91 13:52:25 GMT
>References: <4992@mindlink.UUCP> <1991Mar6.010141.5905@mintaka.lcs.mit.edu> <1303@macuni.mqcc.mq.oz>
>Organization: Macquarie University, Sydney, Australia.
>
>An idea about code compilation, rather than interpretation.
>
>If you assume that a program does not employ self-modifying code, this
>may not be nearly as had as it first appears (I am not saying that it is
>trivial, just not too bad.)  This assumption can be made if the program
>runs out of ROM, and also in 99% of non-ROM cases.
>
>The problems of compiling code for one CPU into that for another are
>fairly easy, but a major problem occurs when table-driven code is
>employed (common.)
>
>What I am suggesting is this.  It is reasonably elegant, and quite
>simple to perform:
>
>Store the loaded memory image of the original program as well as the
>compiled code.  For every instruction in this image, a pointer to the
>compiled code is generated indicating where the corresponding code
>actually is.  As the program executes, the working store is this memory
>image, and if a piece of data is read, it can be dealt with normally.
>However, if a program attempt to jump, branch or call a location using
>data loaded (as would happen with a table), then the corresponding
>location can be determined and the correct code called.
>
>This has a high memory overhead (3-5 times original image size *plus*
>compiled code).
>
>I am sure that there is a more memory-effective way of doing this by
>identifying such tables during the compile, but it seems a difficult
>problem.  Has anyone got any better solutions?
>
>BTW, an idea for semi-compiled code.  Have two branches for every location,
>and simulate the machine that way.  Reading an opcode calls the
>appropriate opcode handler, writing it calls a reevaluation routine that
>changes the pointers for the two locations.  System and special
>locations are handled similarly.  This may actually be a quite
>reasonable solution, as inelegant as it first sounds.
>
>--
>Ian Farquhar                      Phone : + 61 2 805-9400
>Office of Computing Services      Fax   : + 61 2 805-7433
>Macquarie University  NSW  2109   Also  : + 61 2 805-7420
>Australia                         EMail : ifarqhar@suna.mqcc.mq.oz.au
>
>
>From: rjc@geech.ai.mit.edu (Ray Cromwell)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Emulator Mechanics
>Message-ID: <1991Mar7.093149.18707@mintaka.lcs.mit.edu>
>Date: 7 Mar 91 09:31:49 GMT
>Organization: None
>
>
> I got an idea. How about combining both methods? (compile and interpret)
>
> First compile the executable with flow analysis into 68k code. Any
>instruction that tries to touch a hardware register will become
>a trap instruction(or use an MMU to trap it) to emulate the hardware.
>
>  Where does the interpreter come in? Any code that tries to self modify
>or do a change of state (table lookup, indirect jumps, etc) with be
>interpreted.
>
> Sounds hard? You bet. The easier solution of all is to throw faster
>CPUs at the problem.
>
>
>Now for another question:
>
> Why doesn't AMAX multitask? It should be EASY to run multiple copies
>of the Mac ROM as a task (since its a rom, its mostly pure code, except
>each task would need to alter where the ROM things the system memory is.)
>
> For things like the Atari emulator, or another 68000 emulator, the compile
>technique should work great! Just recompile the parts of the code
>that bang on the hardware, or use an MMU to trap them.
>
>
>The whole compilation process is made easier when the machine
>your trying to emulate has an object code format that contains
>seperate data and code chunks, and perhaps some relocation data.
>Further more, if the machine has rules against self modifying code,
>and a device independent OS it becomes trivial.
>
>Could someone run SI or dhrystone on IBeM and tell me how it performs?
>Itis said that SoftPC on a Mac @25mhz 68030 runs at 6mhz AT speed.
>
>Does anyone know if IBM code uses self modifying code, or
>jump tables? What kind of code does their compilers produce?
>And, does the IBM has a special object code format that seperates
>code and data?
>
>
>The compilation technique could work well, but the compiler would have
>to be VERY smart in its flow analysis of detecting code/data, jump
>tables, self modifying code, PC counter and stack manipulation, etc.
>
>I'll clairfy my thoughts tommorow, it's going on 4:30am here, and I 
>need to get some well deserved sleep. :-)
>
>
>From: stevek@amiglynx.UUCP (Steve K)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re: Emulator Mechanics (sorry long post)
>Message-ID: <stevek.3380@amiglynx.UUCP>
>Date: 6 Mar 91 23:12:59 GMT
>References: <4992@mindlink.UUCP>
>Organization: Amiga-Lynx Bbs Multi-Node 201-368-0463 NJ,USA
>
>The idea of a "translator compiler" is very interesting.  I could have sworn I
>saw some talk about a IBM->Atari ST translator that exists on FidoNET.  But
>why is everyone talking about IBM?  Why not Macintosh programs?  Though I am
>not anyting close to a professional programmer, I'd imagine programs that run
>off different computers with the same processor would share some op-codes,
>right?  Even if that is not true, they do both have similar GUIs, windows,
>gadgets, and menu bars, ect. which shold make the translation easier and more
>like the original.  I sincerly hope someone will pick up and persue these
>ideas, it would be very benificial for me and others - I'd love to run a nice
>pascal compiler like Turbo Pascal, and an excellent ANSI editor like TheDraw!
>
>- Steve Krulewitz - Fair Lawn, NJ -
>
>
>From: dclemans@mentorg.com (Dave Clemans @ APD x1292)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re: Emulator Mechanics (sorry long post)
>Message-ID: <1991Mar6.212548.9641@mentorg.com>
>Date: 6 Mar 91 21:25:48 GMT
>References: <4992@mindlink.UUCP>
>Organization: Mentor Graphics Corporation
>
>On the "compiled" approach to emulation:
>
>There definitely has been previous work in this area.  The
>system I've heard of was an internal tool that was used experimentally
>to port some games from the Intel world to the 68K world.
>
>To use it, you basically had to develop enough information to
>get a clean disassembly of the Intel code; i.e., so that you "knew"
>where all the code and data was in the source object file.
>That then was used to drive the tool that produced the "compiled"
>68K file.  After that was done you had to go over the output
>for correctness, system dependencies, etc.; it was not intended 
>as a turn key system.
>
>...
>
>As a side issue, to bring over some of the bigger DOS packages
>you'll have to worry about more than just converting a single
>file.  You have to find and convert all of their code overlay files....
>
>dgc
>
>
>From: ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re: Emulator Mechanics (sorry long post)
>Message-ID: <1312@macuni.mqcc.mq.oz>
>Date: 8 Mar 91 03:51:21 GMT
>References: <4992@mindlink.UUCP> <1991Mar6.212548.9641@mentorg.com>
>Organization: Macquarie University, Sydney, Australia.
>
>In article <1991Mar6.212548.9641@mentorg.com> dclemans@mentorg.com (Dave Clemans @ APD x1292) writes:
>>To use it, you basically had to develop enough information to
>>get a clean disassembly of the Intel code; i.e., so that you "knew"
>>where all the code and data was in the source object file.
>>That then was used to drive the tool that produced the "compiled"
>>68K file.  After that was done you had to go over the output
>>for correctness, system dependencies, etc.; it was not intended 
>>as a turn key system.
>
>Well, on a 6502, it is practically impossible to truly decide what is
>code and what is data.  Let's imagine that you are (as I have done),
>writing a 6502 disassembler.  For every byte you store a status thst
>says DATA, OPCODE, ARGUMENT, and also non-exclusive flags (eg.
>BRANCHENTERSHERE, BRANCHISNOTHERE.)  Initially, all bytes are set to DATA,
>and BRANCHISNOTHERE.
>
>Write a recursive procedure that starts at the program entry point, and
>goes through the code.  For every byte read as an opcode, tag it as an
>OPCODE, and the bytes following as ARGUMENT.  If you get to a branch
>instruction, recursively call the new branch point, and continue
>processing that until you hit a byte that has already been processed
>(ie. not tagged DATA.)  You should continue until the whole procedure
>exits, then run the same thing on the RESET, INT and NMI vectors.  Calls
>are treated the same way as branches, except that the routine exits to a
>higher level invocation when it hits a RET or RTI.
>
>Now, you should have all the data tagged as either program (OPCODE and
>ARGUMENT), or DATA.  Right?  Wrong.  Why?  Because the 6502 has no
>branch always instruction, and your program may continue past what
>appears to be a conditional branch, into data, and screw everything up
>completely.
>
>I experimented with using a two pass approach to this problem.  First,
>the program was scanned sequentially, treating every byte as an opcode,
>and tagging every point referenced by some branch, call, jump or vector.
>Then, when a branch was found during the second recursive pass, the
>program would backtrack and examine every last opcode till it hit a
>branch in point (after which no assumptions could be made), to see if
>the flags were left in a deterministic state.  At this point I lost
>interest in the whole idea.
>
>Anyway, on the 6502 and anything without a BRA or equivalent, the
>problem of automatically determining what is data and what is code is
>extremely difficult.
>
>However, on the 68K, this approach is probably quite profitable.  Why?
>Because there is enough correspondence between the 6502 and 68K
>instructions sets (both having the same ancestor, the 6800) to mean that
>the compilation process is reasonably simple.
>
>Simulating the hardware is still a problem, and I'll have to give that
>one some thought...  I still tend to favor the idea that I presented in
>a previous article, carrying around a compiled image (of code only), and
>uncompiled data with labels to the compiled code and handlers for the
>I/O locations.
>
>--
>Ian Farquhar                      Phone : + 61 2 805-9400
>Office of Computing Services      Fax   : + 61 2 805-7433
>Macquarie University  NSW  2109   Also  : + 61 2 805-7420
>Australia                         EMail : ifarqhar@suna.mqcc.mq.oz.au
>
>
>From: drysdale@cbmvax.commodore.com (Scott Drysdale)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re: Emulator Mechanics (sorry long post)
>Message-ID: <19614@cbmvax.commodore.com>
>Date: 7 Mar 91 20:15:07 GMT
>References: <4992@mindlink.UUCP> <stevek.3380@amiglynx.UUCP>
>Reply-To: drysdale@cbmvax.commodore.com (Scott Drysdale)
>Organization: Commodore, West Chester, PA
>
>In article <stevek.3380@amiglynx.UUCP> stevek@amiglynx.UUCP (Steve K) writes:
>>The idea of a "translator compiler" is very interesting.  I could have sworn I
>>saw some talk about a IBM->Atari ST translator that exists on FidoNET.  But
>>why is everyone talking about IBM?  Why not Macintosh programs?  Though I am
>
>IBM to Atari ST ports should be relatively simple.  the ST essentially runs
>CP/M 68K with several extensions, much like ms-dos.  so low level calls
>will pretty much translate directly, and you only have to worry about access
>to hardware.
>
>>- Steve Krulewitz - Fair Lawn, NJ -
>
>  --Scotty
>-- 
>=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>Scott Drysdale           Software Engineer
>Commodore Amiga Inc.     UUCP {allegra|burdvax|rutgers|ihnp4}!cbmvax!drysdale
>		         PHONE - yes.
>"Have you hugged your hog today?"
>=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>
>
>From: cg@ami-cg.UUCP (Chris Gray)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re:  Emulator Mechanics (sorry long post)
>Message-ID: <cg.7040@ami-cg.UUCP>
>Date: 9 Mar 91 08:11:28 GMT
>References: <4992@mindlink.UUCP> <1991Mar6.212548.9641@mentorg.com> <1312@macuni.mqcc.mq.oz>
>Organization: Not an Organization
>
>In article <1312@macuni.mqcc.mq.oz> ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar)
>writes:
>
>[Discussion of using a recursive approach to run through all branches and
>calls to tag bytes as being code/data.]
>
>>However, on the 68K, this approach is probably quite profitable.  Why?
>>Because there is enough correspondence between the 6502 and 68K
>>instructions sets (both having the same ancestor, the 6800) to mean that
>>the compilation process is reasonably simple.
>
>My Amiga disassembler, Dis (which is available somewhere or other; I believe
>I sent it out last year some time) does this. The thing it has troubles
>with are things like function variables. In writing disk-resident libraries,
>a lot of that is done, so it doesn't work too well on such things. It has
>special code to handle case/switch statements generated by several compilers,
>but an assembler program doing a branch table will usually mess it up. I
>don't think there is much hope of an automated translater. Perhaps some
>kind of interactive one, so that the user can aid in identifying what is
>code and what is data.
>
>--
>Chris Gray   alberta!ami-cg!cg	 or   cg%ami-cg@scapa.cs.UAlberta.CA
>
>
>From: finkel@TAURUS.BITNET
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re: Emulator Mechanics (sorry long post)
>Message-ID: <2406@taurus.BITNET>
>Date: 9 Mar 91 21:26:03 GMT
>References: <4992@mindlink.UUCP> <1991Mar6.004247.8964@cpsc.ucalgary.ca>
>Reply-To: finkel%math.tau.ac.il@CUNYVM.CUNY.EDU (Udi Finkelstein)
>Organization: Tel-Aviv Univesity Math and CS school, Israel
>
>as someone who wrote a 8085 emulator/disassembler/monitor for the C64
>I would like to contribute my on thoughts on the subject.
>
>I have toyed the idea of writing an IBM PC program translator that
>would take an IBM program and 'translate' it to run on the amiga, but
>after careful examination of the idea I decided to drop it for a few
>reasons:
>
>1. There is no way to find out at compile time which memory references
>are accessing special RAM areas such as the text/graphics video screen.
>
>2. self modifying code breaks such schemes easily
>
>3. code/data seperation can be tough. for example, it's very hard to detect
>if memory block contains code ( translate it) , data ( don't touch it)
>or worse - near or far pointers.
>
>(2) is rare, but (1) and (3) are common, so I guess many programs will
>break.
>
>
>even commercial systems claiming the ability to 'compile' PC binaries
>into UNIX programs such as XDOS (anyone heard of them lately??) aren't
>automatic.
>
>Instead, I decided to try concentating on the 'usual' type of emulators
>such as IBeM and Transofrmer.
>
>What caught my attention is that a large portion of the time an emulator
>is spending while emulating an instruction is to check whether a memory
>read/write accesses video memory. every address being written to memory must
>be checked whether it lies in the $BXXXXX range, and if it does, it should
>be written to the screen.
>
>What I really wanted to do if I had an MMU based machine is to write an
>emulator that will use the MMU to track such memory accesses. The emulator's
>memory will be referenced without translation, but every address in the range
>where the video memory is located will be caught by the MMU and special code
>will be run to handle it. This would speed things up.
>
>Udi
>
>
>From: ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re: Emulator Mechanics (sorry long post)
>Message-ID: <1323@macuni.mqcc.mq.oz>
>Date: 10 Mar 91 03:33:49 GMT
>References: <4992@mindlink.UUCP> <1991Mar6.004247.8964@cpsc.ucalgary.ca> <2406@taurus.BITNET>
>Organization: Macquarie University, Sydney, Australia.
>
>In article <2406@taurus.BITNET> finkel%math.tau.ac.il@CUNYVM.CUNY.EDU (Udi Finkelstein) writes:
>>as someone who wrote a 8085 emulator/disassembler/monitor for the C64
>>I would like to contribute my on thoughts on the subject.
>
>A 8085 toolkit on a C64?  The mind boggles.  Why?! :-)
>
>>2. self modifying code breaks such schemes easily
>
>I have yet to see a piece of self-modifying code on an IBM PC, and it
>will cause problems on the many 386DX and 486 systems that have instruction 
>caches without write-through.
>
>>What I really wanted to do if I had an MMU based machine is to write an
>>emulator that will use the MMU to track such memory accesses. The emulator's
>>memory will be referenced without translation, but every address in the range
>>where the video memory is located will be caught by the MMU and special code
>>will be run to handle it. This would speed things up.
>
>Check my article on using lots of memory to very quickly tag this
>occurence on non-PMMU systems, by vectoring every location to a
>handler.
>
>--
>Ian Farquhar                      Phone : + 61 2 805-9400
>Office of Computing Services      Fax   : + 61 2 805-7433
>Macquarie University  NSW  2109   Also  : + 61 2 805-7420
>Australia                         EMail : ifarqhar@suna.mqcc.mq.oz.au
>
>
>From: ecarroll@maths.tcd.ie (Eddy Carroll)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re: Emulator Mechanics (sorry long post)
>Message-ID: <1991Mar9.225807.26560@maths.tcd.ie>
>Date: 9 Mar 91 22:58:07 GMT
>References: <4992@mindlink.UUCP>
>Organization: Dept. of Maths, Trinity College, Dublin, Ireland.
>
>In article <4992@mindlink.UUCP> Chris_Johnsen@mindlink.UUCP (Chris Johnsen)
>writes:
>>
>> [ Interesting discussion about a new sort of PC emulator ]
>
>My final year project for college last year tackled exactly this problem, i.e.
>given a PC executable file, produce an Amiga executable file which will
>perform the same function. The end result worked, in a limited sort of way;
>it was fairly easy for it to be misled by any program of reasonable size.
>
>There are a few tricky problems that must be handled if you want to achieve
>a good degree of success:
>
>  o  Given the instruction MOV reg,VAL do you treat VAL as a constant value
>     or as an offset to an address in one of the PC's segments? The 8086
>     uses the same opcode for both cases, and it is far from easy to work out
>     which meaning to use. It's important to get it right because in the
>     translated program, all the constant values should remain the same
>     but address offsets will be different.
>
>  o  How do you handle jump tables? Most compilers implement switch/case
>     statements with jump tables, but it is not always clear what the bounds
>     of such tables are.
>
>  o  On the PC, byte values can be pushed onto the stack as byte values. On
>     the 68000, you can only push word values. You can see why this might be
>     a problem if you consider a function accessing a parameter at an offset
>     of 8 from the stack pointer. If the preceding parameters were four words,
>     then this is okay. If they were two words and four bytes however, then
>     the offset on the 68000 needs to be 12 to access the same data.
>
>  o  Given that most major PC programs access the PC's video RAM directly
>     for speed reasons, such accesses must be mapped to act equivalently
>     on the Amiga. The problem is that the usual method for doing this is to
>     set a segment register to point to the video RAM and then to access the
>     RAM through this. The same register is likely used for for all sorts of
>     other data access elsewhere in the program.
>
>Of course, there are other problems too. The main thing I discovered during
>the course of the project was that it's a lot harder to do translation than
>it might first seem.
>
>My program performs the translation in two passes. The first pass traces out
>all possible paths through the PC code, building up a symbol table and
>separating code from data as it goes. The second pass then walks sequentially
>through the code, producing 68K assembly source code for each line. Certain
>instructions generate macros instead of opcodes, which are later expanded
>to calls to library functions. The resulting source is fed into Charlie's A68K
>and then linked with a library that sets up a PC-like environment (custom
>screen, ibm.font etc.) and handles the INT calls. The result is a standalone
>executable file.
>
>If anyone else has a shot at it, I'd be interested in seeing how it turns out.
>Perhaps a more practical route to take is to produce a translator that can
>take reasonably well written PC source code (say, C or assembly) and
>automatically convert it to Amiga source code. If such a tool existed and
>worked well, it might encourage more companies to port their products to the
>Amiga.
>
>In the meantime, back to IBem.
>--
>Eddy Carroll           ----* Genuine MUD Wizard  | "You haven't lived until
>ADSPnet:  cbmuk!cbmuka!quartz!ecarroll           |    you've died in MUD!"
>Internet: ecarroll@maths.tcd.ie                  |   -- Richard Bartle
>
>
>From: srwmpnm@windy.dsir.govt.nz
>Newsgroups: comp.sys.amiga.emulations
>Subject: 8 methods to emulate a Z80
>Summary: 8 methods to emulate a Z80 on a 68000
>Keywords: Z80 68000 Amiga
>Message-ID: <18847.27d80900@windy.dsir.govt.nz>
>Date: 8 Mar 91 21:58:24 GMT
>Organization: DSIR, Wellington, New Zealand
>
>Ok folks, here are 8 methods for doing z80 emulation on a 68000, in software.
>(Well, 8 methods to get to decode a z80 instruction and get to the right
>emulation routine, anyway.)
>
>Trade-offs are speed, space and cleanliness.  They all fall short of
>"compiling and optimising", but most of these methods will speed up most
>existing emulators.  As you might expect, the largest and dirtiest code is
>usually the fastest (and least portable).  The same methods should work with
>emulation of 6502, PDP-11 and any other 16-bit processors.
>
>In all methods, I assume there is a 64kb block of memory representing the z80's
>address space, allocated by AllocMem (say), and pointed to by "z80ram".
>
>-------------------------------------------------------------------------------
>Method 1: The "standard" method:
>
>I call this method "standard" because it's used in both of the CP/M z80
>emulators I know about.  The general idea is to decode the current instruction
>and jump to the appropriate emulation routine via a vector table.  That is,
>like a CASE statement with 256 selections.  The code is clean and re-entrant.
>
>; Setup
>	move.l	z80ram,a2		;   load pseudopc
>	lea.l	optabl(pc),a1		;   a1 always points to optabl
>	lea.l	mloop(pc),a3		;   a3 always points to mloop
>
>; Main loop (decode) starts here
>mloop:	moveq	#0,d0			; 4 Execute appropriate subroutine.
>	move.b	(a2)+,d0		; 8 Grab the next opcode and inc pc.
>	asl	#2,d0			;10 D0 high word is still zero!
>	move.l	0(a1,d0.w),a0		;18 Get address of routine from table
>	jmp	(a0)			; 8 Do the subroutine.
>					;48 total cycles to decode
>	even
>optabl:	dc.l	nop00,lxib,staxb,inxb,inrb,dcrb,mvib,rlc
>	dc.l	...
>
>Each z80 instruction emulation routine ends with:
>
>	jmp	(a3)
>
>-------------------------------------------------------------------------------
>Method 2: The "position-independent" method:
>
>This is slightly quicker, the executable is more than 1500 bytes smaller, and
>you get another register to play with in the emulator (a1 in this case).  I
>currently use this method (or close to it) in my Spectrum emulator.  The code
>is clean and re-entrant.
>
>	move.l	z80ram,a2		;   load pseudopc
>	lea.l	mloop(pc),a3		;   a3 always points to mloop
>mloop:	moveq	#0,d0			; 4 clear opcode word
>	move.b	(a2)+,d0		; 8 get opcode byte
>	add.w	d0,d0			; 4 2 bytes per entry
>	move.w	optabl(pc,d0.w),d0	;14 get offset of routine
>	jmp	optabl(pc,d0.w)		;14 do instruction
>					;44 total to decode
>	even
>optabl:	dc.w	nop00-optabl,lxib-optabl,staxb-optabl,inxb-optabl
>	dc.w	inrb-optabl,dcrb-optabl,mvib-optabl,rlc-optabl
>	dc.w	...
>
>Each instruction emulation routine ends with:
>
>	jmp	(a3)
>
>-------------------------------------------------------------------------------
>Method 3: The "decode-at-end-of-instruction" method:
>
>(There are really 2 methods described here.)  Take either method 1 or method 2.
>Instead of ending each emulation routine with "jmp (a3)", end each one with a
>complete copy of the code from mloop to the indirect jmp.  There is no longer
>a main loop, because each instruction jumps directly to the next one.
>
>This method is slightly faster, takes maybe twice as much code, is clean, and
>is re-entrant.  It also saves yet another reserved register, in this case a3.
>(Personally, I find that a z80 emulator needs as many free registers as you
>can get your fingers on.)
>
>-------------------------------------------------------------------------------
>Method 4: The "threaded jsr's" method:
>
>Warning: This method uses self-modifying, non-re-entrant code, and therefore
>is not recommended.  This code is hazardous to your cache!  (No flames please
>--- read on).
>
>Introduce a 390kb contiguous block of code (called thread) which looks like
>this:
>
>thread:		jsr	patch	; 0
>		jsr	patch	; 1
>		...
>		jsr	patch	; 65535
>		jmp	thread
>
>That is, there is a jsr instruction for each byte in the z80's address space.
>This is in addition to z80ram.
>
>To start the emulator, you transfer control to "thread".  What the "patch"
>routine does is to replace the current "jsr patch" with "jsr this_routine",
>where this_routine is the emulation routine for the corresponding opcode in
>z80ram.  Then patch jmps to the this_routine to execute the instruction and to
>return to the next jsr in the thread.  After a while, patch will no longer be
>called (except by z80 self modifying code), and every jsr made will be to
>emulate a z80 opcode directly.
>
>Whenever a z80 instruction writes to RAM, it patches the corresponding
>"jsr this_routine" with "jsr patch".  As a variation, it could patch
>"jsr this_routine" with "jsr new_routine", but that would probably be slower
>in general.
>
>Advantage:
>
>It would be faster than methods 1 to 3, --- I think, --- especially in the
>Spectrum emulator, which has to do a lot of work with every write to RAM to
>check for ROM and video RAM anyway.  The main reason for the extra speed is
>that it no longer has to decode the opcode on every instruction.  There are
>the extra overheads of call and return though, and extra work to do on every
>RAM write.
>
>Disadvantages:
>
>1: The code breaks C='s self-modifying code law.  To run on Amiga's with
>caches, it would have to either disable the caches or update them manually
>after every patch.  The code is extremely dirty, not re-entrant, and
>definitely not recommended;
>
>2: You need 390k contiguous memory (plus another 64k somewhere else, plus
>whatever else you need for video).
>
>Other characteristics:
>
>Code would run slowly the first time round the loop, then speed up.
>
>--------------------------------------------------------------------------
>Method 5: The "replicated code" method.
>
>Warning: This also uses self-modifying, non-re-entrant code and is therefore
>not recommended.
>
>Thread consists of 65536 blocks of code, each long enough to emulate the
>trickiest z80 instruction.  Initially it contains 65536 copies of patch.  (You
>will need A LOT of contiguous memory.)  What patch does is to actually copy
>the code for the opcode over itself, then transfer control to the beginning of
>itself.  (Tricky, but it can be done.)  Every emulation routine finishes with
>a "bra.s next_instr" so they are all really the same length.  That saves the
>call and return overhead.
>
>If an emulation routine is too long, then just use a jmp to somewhere.  In
>practice, you would probably start with:
>
>	jsr	patch
>	bra.s	next_instr
>
>in every slot, rather than a complete copy of patch.  Z80 RAM writes would
>copy the above code to the corresponding slot, if necessary, rather than
>copying the whole patch routine.
>
>Short of "compiling and optimising", this is the fastest method I can think of,
>but it is incredibly space-wasting, self-modifying, extremely dirty, and
>definitely not recommended.
>
>--------------------------------------------------------------------------
>Method 6: The "threaded vector table" method:
>
>Ok, now to fix the self-modifying code problem.  Take method 4 (threaded jsr's),
>but use a 262kb vector table in a private data segment, instead of a thread in
>the code segment.
>
>vectors:	dc.l	patch	; 0
>		dc.l	patch	; 1
>		...
>		dc.l	patch	; 65535
>		dc.l	jmp_thread
>
>The main instruction loop looks like:
>
>		lea.l	vectors,a0
>		lea.l	mloop(pc),a2
>mloop:		move.l	(a0)+,a1	;12 cycles
>		jmp	(a1)		; 8 cycles
>
>and every instruction finishes with "jmp (a2)".  A0 is acting as a "pseudo-pc"
>into the vector table.  Of course patch performs the same functions as before
>(except it is no longer self modifying, it just patches a vector).  The vector
>table still needs to be updated by every write to Z80 RAM.  The code is
>re-entrant provided each task has a separate copy of the vector table.
>
>--------------------------------------------------------------------------
>Method 7: The "position-independent threaded vector table" method:
>
>Same as method 6, except that now the private data segment is:
>
>thread:		dc.w	patch-base	; 0
>		dc.w	patch-base	; 1
>		...
>		dc.w	patch-base	; 65535
>		dc.w	jmp_thread-base
>
>and the main loop is:
>
>		lea.l	thread,a0
>		lea.l	mloop(pc),a1
>mloop:		move.w	(a0)+,d0	; 8 cycles
>		jmp	base(pc,d0.w)	;14 cycles
>base:
>patch:		...
>op00:		...
>op01:		...
>jmp_thread:	...
>
>Now it is position-independent, only 128kb contiguous memory, the executable
>is 1500 bytes smaller, and it is slightly slower (only by 2 cycles per z80
>instruction though).  The code is re-entrant provided each task has a separate
>copy of the vector table.
>
>--------------------------------------------------------------------------
>Method 8: The "decode-at-end-of-instruction threaded vector table" method:
>
>Same as method 6 except that every opcode emulation routine finishes with:
>
>		move.l	(a0)+,a1
>		jmp	(a1)
>
>instead of "jmp (a2)".  Now isn't that faster?  And it saves a2 for more
>important things.
>
>Unfortunately you can't do exactly the same thing to method 7 unless you can
>write a complete z80 emulator in 256 bytes  8-) .  But you could take method 7
>and end each emulation routine with:
>
>mloop:		move.w	(a0)+,d0
>		lea.l	base(pc),a1
>		jmp	0(a1,d0.w)
>
>instead.  The code is re-entrant provided each task has a separate copy of
>the vector table.
>
>--------------------------------------------------------------------------
>Personally I'm considering using one of the methods 6, 7 or 8 in the next
>version of the Spectrum emulator (probably method 8)  (That is, if I ever get
>enough spare time without more interesting things to do.)  I'll probably make
>the source public domain.  That will use more Amiga RAM, but should go faster
>(I hope).  Any guesses as to which method will be the fastest, and still fit
>comfortably in a 512k machine?
>
>Unfortunately I don't think any of the methods (except the first 3) are
>suitable for an 8088 emulator because of the huge memory requirements.
>
>I'm interested in any ideas anyone might have along these lines.  The
>discussion of "compiling and optimising" is very interesting, but I don't see
>how the details would work.  In particular, how do you cope with self-modifying
>code, code loaders, overlays etc?
>
>
>Peter McGavin.  (srwmpnm@wnv.dsir.govt.nz)
>
>Disclaimer:  I haven't tested any of the above ideas (except 1 and 2).  If you
>see any bugs, point them out.
>
>
>From: daveh@cbmvax.commodore.com (Dave Haynie)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re: Emulator Mechanics
>Message-ID: <19749@cbmvax.commodore.com>
>Date: 11 Mar 91 20:25:55 GMT
>References: <1991Mar7.093149.18707@mintaka.lcs.mit.edu>
>Reply-To: daveh@cbmvax.commodore.com (Dave Haynie)
>Organization: Commodore, West Chester, PA
>
>In article <1991Mar7.093149.18707@mintaka.lcs.mit.edu> rjc@geech.ai.mit.edu (Ray Cromwell) writes:
>
>> Why doesn't AMAX multitask? It should be EASY to run multiple copies
>>of the Mac ROM as a task (since its a rom, its mostly pure code, except
>>each task would need to alter where the ROM things the system memory is.)
>
>Without massive patching of the Mac ROM, I don't think so.  The same reason
>you don't have more sophisticated multitasking on the Mac itself.  You can't
>run multiple copies of the same Mac ROM code, since the code is not reentrant
>like Amiga code.  Without some clever MMU tricks, I don't think you could 
>easily relocate things such that several different copies of the ROM code
>could coexist.  At least, you would expect Apple to have considered any of
>the more mundane tricks available on any 680x0 system.  Apple does seem to have]
>decent technical folks, I doubt they missed any easy tricks...
>
>Via MMU, you could certainly get further.  Multifinder didn't do that, since
>it would have required an MMU based system for any multitasking.  I'm surprised
>Apple didn't work out anything like that for their MacOS-under-UNIX, though.
>
>>Does anyone know if IBM code uses self modifying code, or
>>jump tables? 
>
>Self modifying code and other uglies are very prevalent in MS-DOS code.  That's
>the main reason Intel went for a unified code/data cache for the 80486. 
>Separate I and D caches with internal Harvard architecture (like the '030 and
>'040) can yield a much faster system, all else being equal.  But that extra
>performance would not have been worth tossing out what was apparently a large
>portion of the MS-DOS programs out there.
>
>MS-DOS programs, more and more often, think they're the operating system.  
>Every program is responsible for managing differences between CPUs, managing
>memory (beyond the 8088 model), graphics, etc.  
>
>-- 
>Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests"
>   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
>	"What works for me might work for you"	-Jimmy Buffett
>
>
>From: srwmpnm@windy.dsir.govt.nz
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re: 8 methods to emulate a Z80
>Summary: 4 more methods to emulate a Z80
>Keywords: Z80 68000 Amiga Emulator
>Message-ID: <18850.27dbf20e@windy.dsir.govt.nz>
>Date: 11 Mar 91 21:09:34 GMT
>References: <18847.27d80900@windy.dsir.govt.nz>
>Organization: DSIR, Wellington, New Zealand
>
>Here are 4 more methods for doing z80 instruction decoding on a 68000.  These
>methods are all based on fixed-size instruction emulation routines.
>
>There are some general comments, on handling multiple-byte opcodes, writing to
>hardware registers, and flag handling, at the end of this post.
>
>These 4 methods are faster than methods 1..3 (see previous post), they do not
>have the RAM-write overheads of methods 4..8, and they do not require tables of
>opcode routine address offsets.  Some of these methods might be superior to all
>previous methods.
>
>In all these methods, each z80 instruction emulation routine is fixed at 256
>bytes in length.  (See earlier post by Ian Farquhar.)  They are coded in the
>sequence op_80, op_81, ... op_ff, op_00, op_01, ... op_7f.  So there is exactly
>64k of code.  If a routine is shorter than 256 bytes, the extra space is
>wasted.  If a routine is longer than 256 bytes then it will need a jsr to
>somewhere.
>
>-------------------------------------------------------------------------------
>Method 9: The "self-modifying fixed-size routine" method:
>
>Warning: Self modifying code follows.
>
>This method is almost the same as in Ian Farquhar's earlier c.s.a.e post.
>
>setup:	lea.l	op_00(pc),a1		; a1 always points to op_00
>	move.l	z80ram,a2		; load pseudopc
>
>mloop:	move.b	(a2)+,1$-op_00+4(a1)	;16 patch $ff in jmp instruction, inc pc
>1$:	jmp	$ff00(a1)		;10 Jump to routine.
>					;26 total cycles
>
>Every instruction emulation routine ends with a copy of mloop, rather than a
>jump back to mloop.
>
>The move.b patches the high byte of the offset in the second instruction, so
>the jump goes to the right routine.  (1$-op_00+4 might be 1$-op_00+2 --- I
>haven't checked.)
>
>The code is extremely fast (maybe the fastest yet), but it does not work on
>Amigas with memory caches.
>
>I thought of making it even faster by permanently setting a4 to the address of
>the byte to patch, and then using:
>
>mloop:	move.b	(a2)+,(a4)		;12 Patch $ff in jmp instruction, inc pc
>	jmp	$ff00(a1)		;10 Jump to routine.
>					;22 total cycles
>
>but you run out of reserved registers if there are lots of copies of mloop.
>(I.e, you can't use the decode-at-end-of-instruction technique.)  Using
>"jmp (a3)" at the end of every routine makes it slower.
>
>-------------------------------------------------------------------------------
>Method 10: The "standard fixed-size routine" method:
>
>Now we eliminate self-modifying code.
>
>The main loop is coded into the wasted space at the end of op_ff (so that it is
>within 128 bytes of op_00 --- Remember that op_00 is in the middle of the 64k
>code block).
>
>setup:	subq.l	#2,sp			; make room for scratch area on stack
>	clr.w	(sp)			; low byte of scratch area is always 0
>	lea.l	mloop(pc),a3		; a3 always points to mloop
>	move.l	z80ram,a2		; load pseudopc
>
>mloop:	move.b	(a2)+,(sp)		;12 Opcode to scratch high byte, inc pc
>	move.w	(sp),d0			; 8 high byte is opcode, low byte is 0
>	jmp	op_00(pc,d0.w)		;14 Jump to routine.
>
>Each z80 instruction emulation routine ends with:
>
>	jmp	(a3)			;10
>					;44 total cycles
>
>Unfortunately it's quite a bit slower.  We can do better...
>
>-------------------------------------------------------------------------------
>Method 11: The "decode-at-end-of-instruction fixed-size routine" method:
>
>Register a3 always points to op_00 instead of to mloop, and we have:
>
>setup:	subq.l	#2,sp			; make room for scratch area on stack
>	clr.w	(sp)			; low byte of scratch area is always 0
>	lea.l	op_00(pc),a3		; a3 always points to op_00
>	move.l	z80ram,a2		; load pseudopc
>
>mloop:	move.b	(a2)+,(sp)		;12 Opcode to scratch high byte, inc pc
>	move.w	(sp),d0			; 8 high byte is opcode, low byte is 0
>	jmp	0(a3,d0.w)		;14 Jump to routine.
>					;34 total cycles
>
>Each z80 instruction emulation routine ends with a copy of the decode routine.
>This is faster than method 10, and mloop can be coded anywhere.
>
>Can we avoid using scratch memory and still be as fast?  Think about how you
>might do this before you read on.  An 8-bit shift of register d0 avoids using
>scratch memory, but is slower (on a plain 68000).  The next method shows how to
>make the decode faster and avoid using scratch memory, but it (possibly)
>introduces overhead elsewhere.
>
>-------------------------------------------------------------------------------
>Method 12: The "stretched-address-space fixed-size-routine" method:
>
>This method assumes that the z80 address space (z80ram) is stretched to 128k,
>so that each byte in the z80's address space takes up a word in the Amiga.
>The low order byte of every word must always be 0.
>
>setup:	move.l	z80ram,a2		; load pseudopc
>	lea.l	op_00(pc),a3		; a3 always points to op_00
>
>mloop:	move.w	(a2)+,d0		; 8 Opcode to d0 high byte, inc pc
>	jmp	0(a3,d0.w)		;14 Jump to routine.
>					;22 total cycles
>
>For best results, every routine ends with the mloop code (decode at end of
>instruction).  The instruction decode is faster than method 11, but now many
>instructions will have extra work to do to convert byte z80 addresses to word
>amiga addresses.  Still, this code looks good enough to try.
>
>Miscellaneous hint #9: To convert a byte offset to a word offset, use
>"add.w d0,d0", not "lsl.w #1,d0".
>
>Another miscellaneous hint: Maybe there's a use for movep here.
>
>You could maintain 2 copies of the z80 address space --- one in 64k and the
>other in 128k.  Then it's just a simple matter of writing a byte to both places
>whenever the z80 does a write.  That gets rid of the overhead of converting
>between offset types on memory reads.
>
>But now our method is starting to look like threaded code (method 8) again.
>The threaded code method uses the 128k block to store the offset to the
>handling routine, rather than storing the opcode itself.  The overhead in doing
>a memory write is the same in both methods, and threaded code has other
>advantages (like not having to pad code to 256 bytes, multiple-byte opcodes
>handled better).  So we're back to threaded code again.
>
>-------------------------------------------------------------------------------
>A note on multiple-byte opcode instructions:
>
>The z80 uses these.  They are prefixed with $cb, $dd, $ed and $fd.  There are
>also $ddcb and $fdcb prefixed instructions.
>
>For fixed-size methods, (methods 9..12), the fastest way to cope with long
>instructions is to use more tables.  But that means multiplying the number of
>tables by 5 or 7.  64k of code has just jumped to 320k or 448k.  That's no good
>on a small machine.  Also, if you reserve a register to point to op_00 in each
>table, that's 5 or 7 registers gone.  Oops.
>
>-------------------------------------------------------------------------------
>Some notes on threaded code:
>
>I spent last evening trying threaded code (method 8) in the Spectrum emulator. 
>(See previous post.)  I got a 20..40% speed improvement over the position-
>independent standard method (256-way CASE statement).  It's still several times
>slower than a real Spectrum, unfortunately.  There is some slow code in places
>where there wasn't before, so there is scope for more improvement.
>Unfortunately I introduced some bugs during the systematic changes, and they
>are proving hard to track down.  Everything in the Spectrum ROM seems to work
>ok.  Some machine-code programs that worked before have stopped working.
>
>Threaded code is extremely fast for multiple-opcode instructions.  Control is
>vectored directly to the right routine first time, without having to decode
>multiple tables.  A problem with this is that if a Spectrum program overwrites
>the second opcode of a multiple-byte instruction ($cb, $dd, $ed, $fd) without
>writing to the first byte, then the emulator doesn't cope.
>
>-------------------------------------------------------------------------------
>Note on hardware registers:
>
>Handling writes to hardware registers in the Spectrum isn't really very hard.
>The z80 has a separate IO address space with a separate set of instructions for
>handling it.  (The same is true of the 8088.)  The only thing to watch for is
>writing to video RAM.  It is fixed size and at a fixed place, so it takes 2
>tests.  (There doesn't seem to be a faster way of doing a single bit test ---
>the video RAM doesn't end on a power-of-2 boundary.)  I have a separate task
>(sort of) which uses the blitter to keep the screen up-to-date with the video
>RAM.  So when there is a write to video RAM, the emulator task doesn't have to
>do much.  It just flags the blitter task "Hey, there's something to update in
>character row n, when you wake up".  The blitter task doesn't slow the emulator
>down much, because it's mostly running on another processor, and it sleeps when
>there's nothing to update.
>
>-------------------------------------------------------------------------------
>Some notes on flag handling:
>
>Both of the CP/M emulators I know about spend a lot of time handling z80 flags
>(condition codes).  After just about every instruction they do a "move sr,d0"
>or "move ccr,d0" or call GetCC() to get the 68000 flags, then they do a table
>lookup to translate them to z80 format.  After every logical instruction (not,
>or, xor etc), a second table lookup is done to set the z80 parity flag.  (The
>68000 does not have a parity flag.)  These table lookups are slow.  In fact,
>they often take several times as long as the guts of the instruction itself.
>
>Both these table lookups are totally unnecessary!  It's faster to save the
>flags in 68000 format (in a register).  Routines that test flags simply test
>the corresponding 68000 flag.  For logical instructions, simply save the parity
>byte away somewhere, and set another bit in the register to say to use the
>parity byte and not v flag.  The parity testing instructions (e.g, "jp po,nn")
>look at that bit, and then test either the v flag or the saved parity byte.
>The only times you need to translate flags between z80 and 68000 formats are in
>"push af" and "pop af" instructions.
>
>I got a 10..20% speedup in my Spectrum emulator this way.
>
>-------------------------------------------------------------------------------
>I said in my previous post that threaded code for 8088 emulation would use too
>much memory to be practical.  In fact it would be perfectly practical on an
>Amiga equipped with 3 Mbytes or more.
>
>Peter McGavin.   (srwmpnm@wnv.dsir.govt.nz)
>
>
>From: Chris_Johnsen@mindlink.UUCP (Chris Johnsen)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Emulator Mechanics [Transpiler]
>Message-ID: <5097@mindlink.UUCP>
>Date: 10 Mar 91 17:54:43 GMT
>Organization: MIND LINK! - British Columbia, Canada
>
>
>        Thank you all for the valued input into this discussion to date,  both
>pro and con.  I must say I find it very stimulating.
>
>        When discussing this form platform porting translator/compiler  with
>anyone, I find the need for a short word to use.  In a humble attempt  to coin
>a descriptive phrase, may I suggest transpiler?
>Charlie Gibbs (Charlie_Gibbs@mindlink.UUCP) Jean-Noel Moyne (jnmoyne@lbl.gov)
>Dave Clemans (dclemans@mentorg.com) Dwight Hubbard (uunet.uu.net!easy!lron)
>Pete Ashdown (pashdown@javelin.es.com) Jyrki Kuoppala (jkp@cs.HUT.FI) confirm
>that, indeed some research and even program development has been done in this
>direction.
>
>Eddy Carroll (ecarroll@maths.tcd.ie) has written such a transpiler, as a last
>year project, with some success, some reservations.
>
>Chris Gray (cg@ami-cg.UUCP) suggests that the most practicable route to
>desinging a viable transpiler would be to make it interactive.  BTW Chris, I
>very much enjoyed your compiler articles in the Amiga Transactor.
>
>Ray Cromwell (rjc@pogo.ai.mit.edu) suggested an interesting thought, a sort of
>Usenetware combined effort for development, he also thinks reasonable execution
>speed can be achieved.
>
>        Those are what I percieve to be the ideas supporting the transpiler
>concept.  The statements of contrary considerations are more voluminous. These
>appear to fall into a number of categories.
>          o  Self-modifying code
>          o  Separating code from data
>          o  Determining video access
>          o  Stack handling
>          o  Jump table problems
>          o  Handling overlay segments
>Ian Farquhar (ifarqhar@sunb.mqcc.mq.oz.au) suggested that a compiled module  be
>run concurrent with an emulator type section so that in some parallel  way, any
>references within the source executable, which would also be loaded during
>runtime of the emulation, could be validated.  It could be argued that this
>should be placed on the pro side of the ledger, but the incurred overhead
>during execution would be large.  This function, I had imagined initially,
>would be carried out during the transpiler phase and not attached to, or
>burdening, the runtime execution.
>
>Brad Pepers (pepers@enme1.ucalgary.ca) Jyrki Kuoppala (jkp@cs.HUT.FI)  Jonathan
>David Abbey (jonabbey@cs.utexas.edu) were somewhat concerned with
>self-modifying code.  There were a significant number of voices that dismissed
>this concern.  Personally, I wouldn't worry about it.  If a particular program
>used this technique, for whatever reason, I would accept the fact that not all
>programs can be transpiled.
>
>Ian Farquhar (ifarqhar@sunb.mqcc.mq.oz.au) Dave Clemans (dclemans@mentorg.com)
>Sullivan (Radagast@cup.portal.com) Jean-Noel Moyne (jnmoyne@lbl.gov)  Chris
>Gray (cg@ami-cg.UUCP) raised concerns about determining code from bytes of the
>data persuation.  This is what I had thought would be the biggest stumbling
>block. I hadn't thought there would be so many others. :-)  It's great to hear
>how how long the road is before you begin the journey.
>
>Udi Finkel (finkel@TAURUS.BITNET) Eddy Carroll (ecarroll@maths.tcd.ie) are
>concerned about determining when the access to memory is video ram.  This, I
>believe, can be solved, if it is dealt with in the same manner as the code/data
>resolution.  Please read on.
>
>Eddy Carroll (ecarroll@maths.tcd.ie) points out problems with processor stack
>handling.  How did you resolve this with the transpiler you developed?  It
>occurs to me that every emulator whether it be a transpiler, or interpretive
>type must handle the stack properly.  To me, it would be more difficult to
>approach this problem if I were writting an emulator, as opposed to a
>transpiler, as less executable size constraints would inhibit producing an
>intelligent stack handler.  On the 68000 the stack would be handled as
>word/longword only.  Any bytewise stack functions within the  source executable
>would be transpiled into wordwise handling.  A problem, but surmountable.
>
>Chris Gray (cg@ami-cg.UUCP) Eddy Carroll (ecarroll@maths.tcd.ie) raised the
>problem of coping with jump or branch tables properly.  Weighty point. This I
>would describe as a grey area between code and data where it may be actual jump
>instructions, or a table of label locations, the code to be  emulated
>calculating an offset into it.  This must be handled in a similar fashion to
>the code/data recognition problem.
>
>Dave Clemans (dclemans@mentorg.com) Kurt Tappe (JKT100@psuvm.psu.edu) Jonathan
>David Abbey (jonabbey@cs.utexas.edu) were concerned with the  handling of
>overlay segments.  This bothers me too.  With the Amiga, it  would be possible
>to either use overlays, in the case of very large programs,  or convert to
>all-in-one programs by consolidating the overlay hunks.  The  main problem is
>determining that overlays are employed, loading them and  transpiling.  This is
>closely related to the code/data determination problem.
>
>Dwight Hubbard (uunet.uu.net!easy!lron) suggest that this would work, kind of
>like DSM.  I have DSM and must say it was a model for me when contemplating
>this idea.  Chris Gray (cg@ami-cg.UUCP) believes that an  interactive compiler
>holds some promise.  In one of my initial messages to  Charlie Gibbs I
>entertained the reality of having to fall back on human  intervention, at the
>transpiler output source code level.  This would be  determined at runtime,
>when the emulated program failed to operate correctly  or at all.  A programmer
>would be required to fix the problem.
>
>        What I interpert the huge human parallel processor the net represents,
>to be saying, is that human intervention must be employed.  My only reservation
>in this regard is that there are interperative emulators that run a wide
>variety of software on-the-fly.  To me, this feat is more difficult than a
>transpiler, which can ruminate over the executable for as long as say a
>raytracer.
>
>        To incorporate the above concepts, and arrive at a workable resolution,
>what is required is an expert system, probably requiring a resident expert (not
>software, though brain matter is soft), to resolve the code/data, video access,
>jump/branch table and overlay problems.  I don't suppose that the average
>emulator user is prepared to deal with or even understand the problems we have
>been discussing.  I don't think it would be worthwhile writing a transpiler
>unless it could be made to work in some standalone fashion.  Feed it the
>source executable and out pops the Amiga version.
>
>        Mind shifting stage left...
>
>        I have a suggestion, since it is not considered practicable to write a
>transpiler that will work unassisted, that may resolve the percieved problems
>using some of the concepts from this message thread.
>
>        First, an off-the-wall analogy.  Here he goes again. :-)
>
>        Consider a desirable program, running on another platform, useful to
>an Amiga user, but with a very large dongle attached.  This useful program must
>be run on the dongle in fact.  This is inconvient.  What do I do if I loose my
>dongle?  I propose you think of this as a form of copy protection  for a
>moment, obstructiing the Amiga owner, who also owns a program that runs  on
>another platform.  Follow me so far?  The legitimate program for the  other
>platform cannot be used on the favoured, Amiga machine, so it needs  to be
>decopy-protected.  What do you use if you wish to backup and/or deprotect
>software?  A copier program.  Most copiers can be employed by users who have
>little knowledge of copy protection, yet they succeed in making the copy.  The
>more difficult protection schemes require "brain files" which are written by
>experts to achieve this end.  End of transpiler/copy-protection analogy.
>
>        Take a basic transpiler, such as the one Eddy Carroll wrote, and add a
>toolkit.  The transpiler would do its best to translate and compile the source
>executable.  If this failed, an expert, not just the average user, would either
>run the transpiler in expert mode, or a debug tool from the toolkit, which
>would work through any problem areas.  The result of this would be an expert
>transpiler file.  The beauty of this approach is that any user could run the
>transpiler, given access to the expert file.  These expert files could be
>included with the release package and any new ones could be included in any
>updates or shared by conventional electronic means as PD.
>
>What do you think?
>
>csj
>
> No really officer, I wasn't speeding, just keeping a safe distance in front of
>the car behind me!
>
>Usenet: a542@mindlink.UUCP Phone: (604)853-5426 FAX: (604)854-8104
>
>
>From: Chris_Johnsen@mindlink.UUCP (Chris Johnsen)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Emulator Mechanics [Transpiler]
>Message-ID: <5097@mindlink.UUCP>
>Date: 10 Mar 91 17:54:43 GMT
>Organization: MIND LINK! - British Columbia, Canada
>
>
>        Thank you all for the valued input into this discussion to date,  both
>pro and con.  I must say I find it very stimulating.
>
>        When discussing this form platform porting translator/compiler  with
>anyone, I find the need for a short word to use.  In a humble attempt  to coin
>a descriptive phrase, may I suggest transpiler?
>Charlie Gibbs (Charlie_Gibbs@mindlink.UUCP) Jean-Noel Moyne (jnmoyne@lbl.gov)
>Dave Clemans (dclemans@mentorg.com) Dwight Hubbard (uunet.uu.net!easy!lron)
>Pete Ashdown (pashdown@javelin.es.com) Jyrki Kuoppala (jkp@cs.HUT.FI) confirm
>that, indeed some research and even program development has been done in this
>direction.
>
>Eddy Carroll (ecarroll@maths.tcd.ie) has written such a transpiler, as a last
>year project, with some success, some reservations.
>
>Chris Gray (cg@ami-cg.UUCP) suggests that the most practicable route to
>desinging a viable transpiler would be to make it interactive.  BTW Chris, I
>very much enjoyed your compiler articles in the Amiga Transactor.
>
>Ray Cromwell (rjc@pogo.ai.mit.edu) suggested an interesting thought, a sort of
>Usenetware combined effort for development, he also thinks reasonable execution
>speed can be achieved.
>
>        Those are what I percieve to be the ideas supporting the transpiler
>concept.  The statements of contrary considerations are more voluminous. These
>appear to fall into a number of categories.
>          o  Self-modifying code
>          o  Separating code from data
>          o  Determining video access
>          o  Stack handling
>          o  Jump table problems
>          o  Handling overlay segments
>Ian Farquhar (ifarqhar@sunb.mqcc.mq.oz.au) suggested that a compiled module  be
>run concurrent with an emulator type section so that in some parallel  way, any
>references within the source executable, which would also be loaded during
>runtime of the emulation, could be validated.  It could be argued that this
>should be placed on the pro side of the ledger, but the incurred overhead
>during execution would be large.  This function, I had imagined initially,
>would be carried out during the transpiler phase and not attached to, or
>burdening, the runtime execution.
>
>Brad Pepers (pepers@enme1.ucalgary.ca) Jyrki Kuoppala (jkp@cs.HUT.FI)  Jonathan
>David Abbey (jonabbey@cs.utexas.edu) were somewhat concerned with
>self-modifying code.  There were a significant number of voices that dismissed
>this concern.  Personally, I wouldn't worry about it.  If a particular program
>used this technique, for whatever reason, I would accept the fact that not all
>programs can be transpiled.
>
>Ian Farquhar (ifarqhar@sunb.mqcc.mq.oz.au) Dave Clemans (dclemans@mentorg.com)
>Sullivan (Radagast@cup.portal.com) Jean-Noel Moyne (jnmoyne@lbl.gov)  Chris
>Gray (cg@ami-cg.UUCP) raised concerns about determining code from bytes of the
>data persuation.  This is what I had thought would be the biggest stumbling
>block. I hadn't thought there would be so many others. :-)  It's great to hear
>how how long the road is before you begin the journey.
>
>Udi Finkel (finkel@TAURUS.BITNET) Eddy Carroll (ecarroll@maths.tcd.ie) are
>concerned about determining when the access to memory is video ram.  This, I
>believe, can be solved, if it is dealt with in the same manner as the code/data
>resolution.  Please read on.
>
>Eddy Carroll (ecarroll@maths.tcd.ie) points out problems with processor stack
>handling.  How did you resolve this with the transpiler you developed?  It
>occurs to me that every emulator whether it be a transpiler, or interpretive
>type must handle the stack properly.  To me, it would be more difficult to
>approach this problem if I were writting an emulator, as opposed to a
>transpiler, as less executable size constraints would inhibit producing an
>intelligent stack handler.  On the 68000 the stack would be handled as
>word/longword only.  Any bytewise stack functions within the  source executable
>would be transpiled into wordwise handling.  A problem, but surmountable.
>
>Chris Gray (cg@ami-cg.UUCP) Eddy Carroll (ecarroll@maths.tcd.ie) raised the
>problem of coping with jump or branch tables properly.  Weighty point. This I
>would describe as a grey area between code and data where it may be actual jump
>instructions, or a table of label locations, the code to be  emulated
>calculating an offset into it.  This must be handled in a similar fashion to
>the code/data recognition problem.
>
>Dave Clemans (dclemans@mentorg.com) Kurt Tappe (JKT100@psuvm.psu.edu) Jonathan
>David Abbey (jonabbey@cs.utexas.edu) were concerned with the  handling of
>overlay segments.  This bothers me too.  With the Amiga, it  would be possible
>to either use overlays, in the case of very large programs,  or convert to
>all-in-one programs by consolidating the overlay hunks.  The  main problem is
>determining that overlays are employed, loading them and  transpiling.  This is
>closely related to the code/data determination problem.
>
>Dwight Hubbard (uunet.uu.net!easy!lron) suggest that this would work, kind of
>like DSM.  I have DSM and must say it was a model for me when contemplating
>this idea.  Chris Gray (cg@ami-cg.UUCP) believes that an  interactive compiler
>holds some promise.  In one of my initial messages to  Charlie Gibbs I
>entertained the reality of having to fall back on human  intervention, at the
>transpiler output source code level.  This would be  determined at runtime,
>when the emulated program failed to operate correctly  or at all.  A programmer
>would be required to fix the problem.
>
>        What I interpert the huge human parallel processor the net represents,
>to be saying, is that human intervention must be employed.  My only reservation
>in this regard is that there are interperative emulators that run a wide
>variety of software on-the-fly.  To me, this feat is more difficult than a
>transpiler, which can ruminate over the executable for as long as say a
>raytracer.
>
>        To incorporate the above concepts, and arrive at a workable resolution,
>what is required is an expert system, probably requiring a resident expert (not
>software, though brain matter is soft), to resolve the code/data, video access,
>jump/branch table and overlay problems.  I don't suppose that the average
>emulator user is prepared to deal with or even understand the problems we have
>been discussing.  I don't think it would be worthwhile writing a transpiler
>unless it could be made to work in some standalone fashion.  Feed it the
>source executable and out pops the Amiga version.
>
>        Mind shifting stage left...
>
>        I have a suggestion, since it is not considered practicable to write a
>transpiler that will work unassisted, that may resolve the percieved problems
>using some of the concepts from this message thread.
>
>        First, an off-the-wall analogy.  Here he goes again. :-)
>
>        Consider a desirable program, running on another platform, useful to
>an Amiga user, but with a very large dongle attached.  This useful program must
>be run on the dongle in fact.  This is inconvient.  What do I do if I loose my
>dongle?  I propose you think of this as a form of copy protection  for a
>moment, obstructiing the Amiga owner, who also owns a program that runs  on
>another platform.  Follow me so far?  The legitimate program for the  other
>platform cannot be used on the favoured, Amiga machine, so it needs  to be
>decopy-protected.  What do you use if you wish to backup and/or deprotect
>software?  A copier program.  Most copiers can be employed by users who have
>little knowledge of copy protection, yet they succeed in making the copy.  The
>more difficult protection schemes require "brain files" which are written by
>experts to achieve this end.  End of transpiler/copy-protection analogy.
>
>        Take a basic transpiler, such as the one Eddy Carroll wrote, and add a
>toolkit.  The transpiler would do its best to translate and compile the source
>executable.  If this failed, an expert, not just the average user, would either
>run the transpiler in expert mode, or a debug tool from the toolkit, which
>would work through any problem areas.  The result of this would be an expert
>transpiler file.  The beauty of this approach is that any user could run the
>transpiler, given access to the expert file.  These expert files could be
>included with the release package and any new ones could be included in any
>updates or shared by conventional electronic means as PD.
>
>What do you think?
>
>csj
>
> No really officer, I wasn't speeding, just keeping a safe distance in front of
>the car behind me!
>
>Usenet: a542@mindlink.UUCP Phone: (604)853-5426 FAX: (604)854-8104
>
>
>From: daveh@cbmvax.commodore.com (Dave Haynie)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re: Emulator Mechanics
>Message-ID: <19792@cbmvax.commodore.com>
>Date: 12 Mar 91 22:39:53 GMT
>References: <1991Mar7.093149.18707@mintaka.lcs.mit.edu> <19749@cbmvax.commodore.com> <1991Mar12.011418.24768@mintaka.lcs.mit.edu>
>Reply-To: daveh@cbmvax.commodore.com (Dave Haynie)
>Organization: Commodore, West Chester, PA
>
>In article <1991Mar12.011418.24768@mintaka.lcs.mit.edu> rjc@geech.ai.mit.edu (Ray Cromwell) writes:
>>In article <19749@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes:
>>>In article <1991Mar7.093149.18707@mintaka.lcs.mit.edu> rjc@geech.ai.mit.edu (Ray Cromwell) writes:
>
>>>> Why doesn't AMAX multitask? 
>
>>>Without massive patching of the Mac ROM, I don't think so.  
>
>>  Well  it's not a heavy loss if you can't make the Mac ROM resident, but
>>why does AMAX have to take over the Amiga's operating system? The only
>>thing that would make it really diffcult to run MacOS under
>>AmigaDOS is if Mac code fiddles with absolute memory locations or the
>>OS implements function calls as traps/interupts.
>
>Well, the Mac OS fiddles with absolute memory locations, and the OS implements
>function calls as line-A exceptions.  Apparently, all the absolute locations
>are in low memory and get swapped as part of the process context when you run
>Multifinder.
>
>>I also wonder why readysoft used a special disformat for Mac disks
>>instead of reading Amiga disks.
>
>No doubt this was to make Mac software work on Amiga disks.  The Mac isn't
>as nice about filesystem independence as the Amiga is, so you can't really
>provide a loadable Mac filesystem kind of thing that maps Mac filespace
>into Amiga filespace adequately.  So ReadySoft essentially just built a new
>device driver, which uses Amiga-readable formats, but Mac filesystem 
>organization.
>
>This is apparently why Mac's don't generally talk to MS-DOS disks via an
>alternate filesystem like CrossDOS or MSH, but instead use a user program for 
>the conversion, along the lines of the old PC Utilities.  
>-- 
>Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests"
>   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
>	"What works for me might work for you"	-Jimmy Buffett
>
>
>From: ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar)
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re: Emulator Mechanics [Transpiler]
>Message-ID: <1338@macuni.mqcc.mq.oz>
>Date: 13 Mar 91 03:24:04 GMT
>References: <5097@mindlink.UUCP>
>Organization: Macquarie University, Sydney, Australia.
>
>In article <5097@mindlink.UUCP> Chris_Johnsen@mindlink.UUCP (Chris Johnsen) writes:
>>Ian Farquhar (ifarqhar@sunb.mqcc.mq.oz.au) suggested that a compiled module  be
>>run concurrent with an emulator type section so that in some parallel  way, any
>>references within the source executable, which would also be loaded during
>>runtime of the emulation, could be validated.  It could be argued that this
>>should be placed on the pro side of the ledger, but the incurred overhead
>>during execution would be large.  This function, I had imagined initially,
>>would be carried out during the transpiler phase and not attached to, or
>>burdening, the runtime execution.
>
>No, I suggested that an image of the original code, without translation,
>be carried around with the compiled code, with indexes into the compiled
>code's equivalent sections that would allow jump tables and so forth to
>function correctly.  This would be a memory intensive but low-time-overhead
>time way of resolving problems.  It should also be noted that such a
>system would allow trapping of addresses that need to be handled by
>special code (hardware locations etc.), and that the image is also used
>for data storage, meaning that once loaded and allocated, the program
>would be unlikely to need any further allocation of heap.
>
>This system solves two problems: the data/code differentiation (if you
>accidentally compile some data, it is no great problem) though a much
>more minor problem remains if a couple of opcodes are missed because of
>data being accidentally compiled and the compiler assuming an opcode is
>the data for a false opcode, and also the problem of jump tables
>and jumping to locations stored in registers.
>
>--
>Ian Farquhar                      Phone : + 61 2 805-9400
>Office of Computing Services      Fax   : + 61 2 805-7433
>Macquarie University  NSW  2109   Also  : + 61 2 805-7420
>Australia                         EMail : ifarqhar@suna.mqcc.mq.oz.au
>
>
>From: srwmpnm@windy.dsir.govt.nz
>Newsgroups: comp.sys.amiga.emulations
>Subject: Re: CPU-emulators
>Message-ID: <18878.2801a4ad@windy.dsir.govt.nz>
>Date: 9 Apr 91 11:25:31 GMT
>Organization: DSIR, Wellington, New Zealand
>
>Ilja Heitlager (iheitla@cs.vu.nl) wrote:
>>I'm planning to write a 6502 (and maybe when I like it some others) emulator.
>
>Good on you!  I've played around with the Z80 emulators for the Amiga, by Ulf
>Nordquist and Charlie Gibbs, making them faster.  I have never touched 6502 but
>the same techniques should apply.
>
>>At this moment I think there are two ways of doing it:
>>	1- Compare every Opcode and jump to a routine which executes the
>>	   instruction
>>	2- Do it more or less the way the microcode does it.
>>	   Ok in software you can't do more operations at the same moment.
>
>I found at several more fundamentally different ways of doing it, and many
>variations on those.  So far the fastest practical method seems to be threaded
>code.  You can avoid decoding an opcode for every 6502 instruction altogether!
>The emulation routine for each 6502 opcode ends with:
>
>		move.l	(a3)+,a0
>		jmp	(a0)
>
>So each emulation routine jumps directly to the next emulation routine without
>any decoding at all.  Register a3 is acts like a "pseudo pc" into a 256 kbyte
>table in which there is a longword pointer to the emulation routine for each
>corresponding opcode in the 64 kbyte 6502 address space.
>
>Now, every time the 6502 writes to RAM, you need to update an entry in the
>256 kbyte table.  At first it looks as if you have to do an instruction decode
>to compute the new table value every time the 6502 writes to RAM.  But in fact
>that is not necessary either!
>
>What you do, when the 6502 writes to RAM, is to write a constant address into
>the table.  That constant address points to a special routine called "patch".
>When patch is called, you finally get to do an instruction decode.  Patch
>computes the address of the routine for the current instruction, stuffs it
>in the 256 kbyte table, then jumps to the routine for the current instruction.
>Next time this instruction is executed, control bypasses patch and goes
>directly to the right routine.
>
>A variation of this method which saves memory but is slightly slower, is to use
>word offsets in a 128 kbyte table, instead of longword addresses in a 256 kbyte
>table.  Each routine ends with:
>
>		move.w	(a3)+,d0
>		jmp	0(a2,d0.w)
>
>where a2 holds the base from which all the routine offsets are computed.
>
>This method has more advantages:
>
>1: To handle known ROM entry points, just point the vector for the entry point
>at an optimised 68000 routine to do what the ROM routine does.  There is no
>overhead at all in checking for ROM entry points.
>
>2: To handle multiple-byte opcodes (e.g, prefix instructions), patch can be made
>smart enough to point the vector for the prefix byte to the routine for the
>entire instruction.  There is no need to decode opcodes after the prefix every
>time the instruction is executed.
>
>3: Patch can be made smart enough to recognise common sequences of 6502
>instructions, and to point the vector at an optimised 68000 routine for the
>whole sequence.
>
>Note that 2 and 3 above (if implemented) won't correctly emulate certain types
>of self-modifying code.
>
>There was a good article on "Portable Fast Direct Threaded Code" by Eliot
>Miranda in comp.compilers recently.  He uses GCC to write "machine independent"
>threaded code that is just about as efficient as my 68000-specific code.
>
>Hope this helps.  Regards, Peter McGavin.   (srwmpnm@wnv.dsir.govt.nz)
>
>