Der Mediaplex Sampler - Die 6 von Plex

home *** CD-ROM | disk | FTP | other *** search

/ Der Mediaplex Sampler - Die 6 von Plex / 6_v_plex.zip / 6_v_plex / DISK3 / DFUE_100 / MEGATXT.ZIP / EMU-MECH.TXT < prev next >

Wrap

Text File | 1994-02-07 | 102KB | 2,091 lines

Due to the discussion on the software emulation of other systems on the MEGADEV mailing list, I am uploading this comp.sys.amiga.emulations discussion which occured in 1991. It does contain quite a bit of useful information and ideas for anyone interested in producing their own emulators. It specifically concentrates on the 6502 and Z-80, although the concepts are generally applicable for other systems as well. Ian. >From: jnmoyne@lbl.gov (Jean-Noel MOYNE) >Newsgroups: comp.sys.amiga.emulations >Subject: Re: Emulator Mechanics (sorry long post) >Message-ID: <10652@dog.ee.lbl.gov> >Date: 6 Mar 91 03:03:52 GMT >Organization: Lawrence Berkeley Laboratory >References:<4992@mindlink.UUCP> <1991Mar6.010141.5905@mintaka.lcs.mit.edu> > > > Yes, insteresting indeed. I've _heard_ (i.e. don't quote me on >that) that once a company did such a "application-translator", on unix. >Where you could take an Ms/Dos exe file, and the program would generate a >unix C source code, maping all the Ms/Dos int traps into legal unix os >calls. I've heard this program was able to translate well-known commercial >programs, and that these programs were able to run after that under normal >unix multi-user configuration, the end of the story is that the people >that made this soft could never even try to sell it (guess why). > > Anyway, doing such a thing wouldn't be impossible technically (but >it'd be hard for sure ! (-:), especially for ms/dos programs. All these >big softs are made to run on all the different flavors of IBM clones, that >means they use no dirty tricks, they stick to BIOS calls so it'll run even >on an Amstrad (or under the transformer or IBeM for instance). That means >you could quite easily put all the BIOS ints in an Amiga shared lib, and >taking the 8088 binary, generate C code from it. Programs a'-la >"re-sourcer" allready exist, they take the .exe file, and generate a 8088 >source code file from it, at this point it's possible to generate a C >source code that would do the same thing. You could optimise it to >recognise patterns in the ASM instructions (don't forget most of these big >programs, the one you're interested in to run on the Amiga, are usually >written in C or some other high-level language, and so the binary is >generated by a compiler, thus creating patterns ...), and eventually build >a C source code not to far from the original (the guys who wrote Microsoft >C could eventually write a reverse compiler that would generate a code >very close to the original, and such a program wouldn't be machine >dependent at all). > > An other problem who be to catch the busy wait loops in these >mono-tasking programs (I think most of these loops are in the BIOS, like >waiting for a key, easy to take care of). So these programs would be >multitasking friendly, and so that you could eventually run more than one >at the same time without loosing performance. > > Anyways, all this is my mind wandering sideways (I've thought >about that before), and most of what I wrote is closer to dreams than to >reality (and in bad English, sorry about that, still learning (-:). Mostly >because, if it's not impossible to write such a program, it for sure very >hard ! And so, it'll take a lot of time and effort, more than what a >standalone guy doing that on the side of his regular job could invest. A >team in a company could do it, but no company's gonna take such a risk, >there's too much potential danger on that, even if the potential sales are >very big ! Sure one could say that if you buy the original program, you >should be able to run it ... even on another machine, but I'm sure the >army of lawyers from let's say Microsoft or Aston-Tate would have >something to say about that ! And they could sue you for years without >sweat, preventing you from selling the program before the end of the >trial... etc ... etc ... > > Eventually, if the program was freeware, written by a bunch of >programers not earning money for it, this would be a different figure, >much different. But I'm really no lawyer, I'm still thinking sideways .... >(-: > > > JNM > >-- >These are my own ideas (not LBL's) > > >From: Chris_Johnsen@mindlink.UUCP (Chris Johnsen) >Newsgroups: comp.sys.amiga.emulations >Subject: Emulator Mechanics (sorry long post) >Message-ID: <4992@mindlink.UUCP> >Date: 4 Mar 91 15:20:48 GMT >Organization: MIND LINK! - British Columbia, Canada > > > The following messages were shared in an email environment between Charlie >Gibbs and myself over the last couple of days. We decided that it would be a >good idea to share these thoughts with the readership of >comp.sys.amiga.emulations. The messages speak for themselves for the most >part. > > ------------ Included text of messages ------------ > >Sun Mar 3 6:40:53 1991 From: Chris Johnsen [547] Subject : Emulators > >Hello Charlie! Long time no talk to you eh? I haven't seen you for quite some >time. I was particularly fond of those after-PaNorAmA meetings at Wendy's a >few years ago. I trust you recall me from back then. > >I have been following the comp.sys.amiga.emulations newsgroup on Usenet for >quite awhile now and was quite interested in the IBeM program that is being >posted about recently. In one of the near Alpha/Theta states that, being a >programmer, such as yourself, one recognizes as some kind of "source"; I had >an idea. Since there are not many "new" ideas spawned I wanted to get some >feedback from a person with the expertise you possess. The reasons I thought >of you for this private consultation are: 1) you've written an emulator - >SimCPM 2) you've written an assembler - A68K 3) you have a broad knowledge of >the computing field beyond lowly PC's and 4) you're such a nice guy! > >I was musing about how an emulator would work. I must confess I have no >concrete knowledge of this. It has been said that "Not knowing how to do a job >'right' frees you to discover new and better ways of accomplishing the desired >result". Freeman Patterson (a Canadian photographer), in his book Photography >and the Art of Seeing, called this "thinking sideways". I believe I've >thought of a way of writing an emulator that would run at least as fast on the >Amiga as it would on the source machine. > >My questions for you are: > >Would you be interested in being a mentor to me in developing further this idea >or in fact ascertaining whether my concept has any validity? > >I'm reticent to spill this on the net until I'm somewhat confident that it is >indeed practicable. I have thought about the possible legal considerations of >producing an emulator. There are potential commercial possibilities that one >could consider also. > >Would you please give me just a quick, superficial rundown of the basic >algorithms used in developing an emulator? > >I have assumed that you would read in the object module of say an IBM >executable, read it by opcode or routine, decipher the intent and then call a >library of glue routines to do the job that the program would have on an IBM >or clone. I have no idea how the interrupt structure would be handled but know >that you have done it with SimCPM. > >I don't want to waste your time, but I would appreciate this information. I >would like to get your feedback on these questions to see if you are interested >in further discussion with me on this matter. If you are not interested or >don't have the time I'll certainly understand. I realize I'm asking you for >more information than I'm sharing but an idea is a tiny property and I'd like >to at least savor this for a while before I decide what to do with it. That is >the sole reason I have not gone public with it yet. > >If you are indeed interested, after you tell me on the highest level how an >emulator functions, I'll be able to describe this idea on some kind of >comprehensible basis. I'm looking forward to your response! Thanks Charlie. > >csj > >Sun Mar 3 23:12:32 1991 From: Charlie Gibbs [218] Subject : Emulators > > Indeed I am somewhat short of time these days, but I wouldn't mind kicking >the odd idea around without getting too wrapped up in it. I do understand your >idea of "thinking sideways" and enjoy being able to do it myself from time to >time. A similar way I've heard it described is that even though it's been >proven that bumblebees can't fly, they don't realize this and so do it anyway. >Or to put it another way, I like to write programs that are too stupid to know >what they're doing, so they can do anything. > > Your ideas on emulation are basically in line with what's considered the >standard way of doing things. A machine instruction is analyzed as to just >what it's supposed to do, and appropriate code then carries out the operation. >The guts of SimCPM appeared in Dr. Dobb's journal as an emulator that was meant >to run under CP/M-68K. This made the job a bit easier for the original author, >since the CP/M-68K system calls were quite similar to the CP/M calls that he >was emulating. I had to replace this portion of code with appropriate AmigaDOS >routines. In addition, I extended the code to handle the full Z-80 instruction >set, since the original code could only handle the 8080 subset. > > Since emulating another processor in software is quite a CPU-intensive >process (several machine instructions have to be executed to emulate a single >machine instruction on the target machine) I tried to optimize SimCPM for speed >at the expense of memory and redundant code. The overhead of a single >subroutine call, plus any extraction and interpretation of arguments, would >require several times as much time as a hand-picked set of instructions >dedicated to a single opcode. > > For system calls there's easy out - as soon as I recognize what the >emulated program is trying to do (e.g. read a block from a disk file), I call >the corresponding AmigaDOS routine, so I/O can proceed pretty well at native >speeds. Therefore, even though CPU-bound programs might run 10% as fast as >they would on the target machine, I/O bound programs might get up to 50% of >speed. > > Interrupts were easy - since most CP/M systems don't use hardware >interrupts, except possibly for a very few hardware-dependent programs, I >simply didn't worry about them. Software interrupts (the RST instruction) were >a snap on the other hand, since they're basically a special-purpose subroutine >call. > > The Intel 8080 is fairly easy to emulate because the opcode is uniquely >determined by the first byte of the instruction. Some instructions might have >register numbers encoded in a few bits of that first byte, but I just treat >them as special cases. To decode the byte, I just multiply its value by 4 >(shift left 2 bits) and use the result as an index into a table of 256 pointers >to the actual emulation routines. Since there are 64 possible MOV instructions >(well, 63 because one of the bit combinations is actually HLT) I actually have >63 MOV emulations, one for each combination of registers. This means that I >don't have to do any register decoding, since each routine consists of a >dedicated 68000 MOVE instruction, followed by a jump back to the main emulation >loop. Lots of almost-redundant code, but it's about as fast as it can get. > > This is getting kind of long-winded. I'd be interested in hearing any >ideas you may have; although I wouldn't have time to get into the programming >of such stuff, I'm willing to act as a sounding board. Talk to you soon... CJG > >Sun Mar 3 23:18:48 1991 From: Charlie Gibbs [218] Subject : Emulators > > Another approach I've heard of is to "compile" the code to be emulated >into native machine code. This would involve a front-end program which would >read the target machine's program and analyze the instructions. For instance, >if the "compiler" detects an instruction that does a move between two of the >emulated machine's registers, it would simply generate a move instruction in >the emulating machine's code. It could generate either a translated assembly >language source file or a machine-language file ready to load into the >emulating machine. This would require the "compilation" process to be run once >on the program to be emulated, and you'd then run the output of this >"compiler." There are special tricks to consider here, such as resolving >addresses - you couldn't just copy the memory addresses across because the >emulated routines would likely be a different size. It might be easier to >generate a label (e.g. Axxxx where xxxx is the hex address in question) in an >assembly source file and let the emulating machine's assembler sort it all out. > > I've never actually seen this process in action, but it's another >possibility. --CJG > >Mon Mar 4 12:38:31 1991 From: Chris Johnsen [547] Subject : Emulators > >Thanks a lot for your effort in explaining SimCPM to me man. As you describe >it, it would seem that I had intuitively understood the basic concepts. I >would think that interrupts would be the hardest part to get down to reliable >operation. What I had in mind, while thinking about this, in general terms, >was an emulator that was non-specific as to the machine, therefore I was >attempting to contemplate it handling say IBM, Mac, (hey it may even be >possible to deal with Amiga emulation!) and Atari ST on the Amiga and >imagining what the various architectures would require. All this on a very >abstract level. > >Your second message hit the nail on the head! I got bogged down at about the >level you describe in your first message. Lots of details to be sought and >worked out. Gee, I'm really not even available to code another program just >yet anyway. I was giving the whole concept a rest when, what I thought of, >kind of sideways (lazy minds tend to look for an easier way around an >obstacle, sometimes unconsciously, even though this can lead to harder, though >more elegant solutions to problems), was to read the opcodes from the "source >executable" of the emulated machine, producing an assembly listing of the >program. This I imagine would be a two pass process, sort of like a C >compiler, followed by an assembler's two passes, and finished off with a >linker. > >I thought that, if the compiler was "intelligent" enough, the output, though >likely larger, would be much faster than the common "interpreter type >emulator". I had never heard of such an idea and since there are none out >there, wanted to discuss this with you. I have developed the idea no further >than this in essence. > >I did think of a few other considerations however. If one could, indeed, >compile an executable image of say Lotus 123 from the IBM into a program which, >on a base Amiga, could run at half speed, or on a A2500 or A3000 at twice the >speed, it would be a viable alternative, besides being a neat toy. However, >the standalone program generated would likely infringe on the copyright of >Lotus because the Amiga executable would actually be Lotus 123. Take >WordPerfect for instance. The latest version available is 4.1 or just a >micro-point higher, no problem, get hold of the IBM version 5.1, I believe it >is, and compile it and you have something some other people are wailing for. Of >course the rebuttal (I can hear you thinking?) is that, if a person owns >WordPerfect he has an inalienable right to run it. Run it on an IBM. Run it on >a clone. Run it through an Amiga compiler. You know, if it's for personal >use, etc. > >As to the increase in size of the "compiled emulation" program, I have a couple >of ideas. First, the executable, though larger, would be standalone, except >for any support libraries. This doesn't mean that this "form" of emulator, >more like a "translator of executables", would be any less efficient than the >"interpreter type". Perhaps more memory efficient in a couple of possible >ways. Since the interpretation section of the program is in the compiler, and >the source executable is not required at runtime, memory usage may well be less >with a "compiled emulation". The second concept is to use link libraries which >would bind only the emulation routines required to the final program. Possibly >a combination of bound-at-link-time modules of less frequently used routines >and a shared library of essential routines all programs would need. A solely >link library approach would leave this concept open to claims that pirates >could produce "warez" that need no extra code or setup to work. Of course, >pirates appear to be capable of ripping anything off anyway! > >This "compiled emulation" would, given sufficient memory and CPU >speed/efficiency, allow the running of multiple programs. Both emulated >programs and standard Amiga programs. Through the use of a shared library more >than one emulating program could be run without the overhead of multiple >"emulation interpreters" resident in memory. > >The compiler could generate C statements so that you could take advantage of >the advancements in technology in the compiler, assembler and linker, without >having to deal, directly, with those parts of the system. I know this would >make the compiler operation more unwieldy. More operations, therefore it >would take longer, but theoretically the source is bugless, so you would >expect the output of this "emulation compiler" to either succeed or fail. >You'd run the emulator on the program only once. The beauty of producing >assembler (C would be better here), is that if it didn't work first time, a >programmer type could patch it up in source and get it running. I'm really >intrigued by this idea. Where did you hear about it, do you remember? > >My knee-jerk reaction initially was to file the idea, but then I got to >wondering why no one had done it. There were many emulators out there for >various source machines. Why were none of them compilers? Another idea I had >was to contact the dude in Oz (or is he a Kiwi?) that wrote IBeM. He already >has the emulation working except for the parallel and serial ports. It would >appear that he reads the IBM object code, deciphers it and runs a routine, or >simply does a MOVE using an opcode lookup table, as you suggest; an >interpreter. If he instead simply wrote out an instruction in ASCII to do the >call or move instead, using a shared library of his emulation routines, he'd >basically have it. The end user would also have to have an assembler or C >compiler, however. This type of approach has got to produce faster emulation, >if it is possible. I believe it to be. > >Anyway, that's what I had in mind Charlie. I really do appreciate your >feedback on this. Care to comment on any directions you think could be >followed? Do you know anyone with enough venture capital to fund the further >development of this concept? ;-) Do you think I should approach the author of >IBeM (cute name) directly? Or, should this private discussion we've been >having be moved to Usenet? Thanks again Charlie! I appreciate you man. > >csj > >Mon Mar 4 15:36:53 1991 From: Charlie Gibbs [218] Subject : Emulators > > I can't remember where I first heard of the idea. The converted code >won't necessarily be smaller than the original, depending on the relative sizes >of corresponding machine instructions on both machines. However, if you could >make the compiler really smart it might be able to recognize certain sequences >of instructions and replace them by sequences designed to accomplish the same >thing more efficiently. For instance, since the 8080 doesn't have a multiply >instruction it needs to fake it with a bunch of adds and shifts. A smart >compiler, if it could recognize such a routine, could replace it with a single >68000 multiply instruction and see huge savings. > > I'd stay away from calling subroutines; the overhead could kill you. > > The copyright issue could be a sticky one, although I can't see any >problems if you run the converter on your own copy of the emulated software and >don't try to sell the result. It would no doubt be classified as a "derivative >work". > > Perhaps it might be interesting to throw this discussion out to Usenet. It >won't be a trivial job, which is probably why we haven't seen it done >elsewhere. Remember that a straight machine-code emulation duplicates all the >register fiddling that is required by the target machine's architecture (and >the 80x86 family needs a LOT of register fiddling). This code is replaced by >the 680x0's own internal fiddling if you're re-compiling source code. One way >of looking at it is to decompile the original machine code, then recompile it >for the new machine. > > Interesting stuff... CJG > > ------------ End of included text of messages. ----------- > >Both Charlie Gibbs and myself frequent this newsgroup and look forward to any >additions to this discussion with which others may respond. Sorry that the >posting is so long but I felt there was little enough chaff contained in the >messages to warrant including all of them. > >csj > >The hard way is usually the disguised easy way, you take your choice. Usenet: >a542@mindlink.UUCP Phone: (604)853-5426 FAX: (604)854-8104 > > >From: rjc@pogo.ai.mit.edu (Ray Cromwell) >Newsgroups: comp.sys.amiga.emulations >Subject: Re: Emulator Mechanics (sorry long post) >Message-ID: <1991Mar6.010141.5905@mintaka.lcs.mit.edu> >Date: 6 Mar 91 01:01:41 GMT >References: <4992@mindlink.UUCP> >Organization: None > > > Very interesting article. I myself have been tempted several times >to try and write an emulator. Since I programmed 6502 assembly on the >C64 for 4 years, and I know 68000 on the Amiga, I was tempted to >try to beat the speed of the other emulators. Then I realized >the sheer magnitude of the project. Emulating the instruction set is >easy. In fact, I am quite confident I can make a 6502 emulator run >faster on the Amiga then the C64. The hard part is the hardware. >Most C64 programs discard the oS entirely and bang on the hardware. >Further more, most of them use polling loops, like polling the >raster beam register and using precisely cycle timed delays. Moreover, >the VIC chip contains several glitches that allow programmers to >use tricks to remove the borders, vertically scroll the screen to >ANY raster location, horizontally shift the screen, vertically and >horizonally interlace the screen, stretch pixels (double, triple, >quaduple) length vertically. This is virtually IMPOSSIBLE to >detect, unless the emulator is artifically inteligent. And, any >program that has a fastloader won't work. This is because fastloaders >usually transfer data over the serial clock line, and data line. This >doubles bandwidth, unforunately it requires PERFECT timing, so perfect >in fact that it won't work on PAL computers, and vice versa. >Sprites are another problem, since Amiga sprites are only 16 pixels wide, and >C64 sprites can have their width and heigth doubled, and they rely >on a chunky pixel format. Text is another problem since the >C64 has a builtin Text mode. > > The Mac is the easiest computer to emulate because it's not a computer >at all. The Macintosh computer does not exist, it's nothing more >than a ROM chip. > >A few days ago, I was impressed. I downloaded a demo from ab20 called >C64Music.zap. This demo emulates 6502 at 100% (in fact, it emulates it >at perfect timing because the music is exactly the same speed.) This >demo emulates the SID chip PERFECTLY, and I mean perfect. These guys >should join together with the maker of A64. > > > I can't speak for other 6502 emulators, but if I wrote one, the fastest >method looks like table lookup, with careful optimization to make >sure things are long word aligned. For instance, I might do something like > >pc6502 equr a5 >accum equr d5 >xindex equr d6 >yindex equr d7 >stack equr a4 ; 6502 stack, which is base address + $0100 on the C64 >stat equr d4 ;status register > > > > [allocmem the 6502's address space, load in an > executable and set the pc to it's start] > > lea jumptbl(pc),a2 > sub.l d0,d0 >loop move.b (pc6502)+,d0 > lsl.l #2,d0 > move.l (a2,d0),a3 > jmp (a3) > > >then every instruction would be emulated (even undoc's) and put into >the jumptbl. The code for 'LDA address' might look like: > >lda sub.l d0,d0 > move.b (pc6502)+,d0 > lsl.l #8,d0 > or.b (pc6502)+,d0 > add.l addresspace,d0 ;this code inverts the 6502 little-endian > ; and then add's the base address of the > ; memory that was alloc's for it > move.l d0,a3 > move.b (a3),accum > jsr GetCC() ? ;It might be better to use move SR, providing you > ; check the machine you were running on and did > ; and did a move ccr otherwise > ; status reg is now in d0 > and #mask,d0 ; mask off everything but the Z bit > bne whatever >whatever bclr #1,stat > jmp loop > bset #1,stat > jmp loop > >(note: this code can be optimized, its off the tip of my tongue, and >probably bugged since I haven't coded in asm in awhile) > >>From my quick calculations, the jump table dispatcher incurs about a 3-4 >microsecond delay in the fetch of each instruction. This is equivelent >to about 4 cycles on a 6502 @1.02mhz. If you had infinite amounts of ram, >the object code loader could 'inline' the code for each instruction >and get rid of this delay, I beleive this is probably how the C64Music >demo does it, since music players on the C64 were only about 1k of code. > >The Lda routine itself looks like about 2.2 times slower than a true 6502 >delay which 4 cycles. However a 25mhz 68030 would run more than >twice as fast. > > >Theoretically speaking, an IBM emulator running on an Amiga3000 >should be running at atleast 5mhz 80286 speed. Consider SoftPC on >the NeXT which runs at 12mhz UNDER UNIX. 68040's are about 2-3 times >faster then 68030's, so SoftPC on the Amy should run at about 5mhz. > > > Maybe we all should trying something like 'NetIBM'. What I mean, is >like Nethack, we should all participate in coding an IBM emulator. >Each person might post a small code segment (in assembler), the rest >of us can compete optimize it. I remember having a contest with >some trying to optimize the 'size' of a binary to decimal print >routine, the final result was the code was reduced 300% in size. >(we kept passing the optimized source back and forth, each shedding >a few bytes.) > > >Regardless of what happens. Let's keep the discussion up, it's interesting >and educational. > > > > > >From: ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar) >Newsgroups: comp.sys.amiga.emulations >Subject: Re: Emulator Mechanics (sorry long post) >Message-ID: <1303@macuni.mqcc.mq.oz> >Date: 6 Mar 91 12:57:34 GMT >References: <4992@mindlink.UUCP> <1991Mar6.010141.5905@mintaka.lcs.mit.edu> >Organization: Macquarie University, Sydney, Australia. > >In article <1991Mar6.010141.5905@mintaka.lcs.mit.edu> rjc@pogo.ai.mit.edu (Ray Cromwell) writes: >> The Mac is the easiest computer to emulate because it's not a computer >>at all. The Macintosh computer does not exist, it's nothing more >>than a ROM chip. > >The best description of a Mac I've ever read! :-) > >> I can't speak for other 6502 emulators, but if I wrote one, the fastest >>method looks like table lookup, with careful optimization to make >>sure things are long word aligned. For instance, I might do something like > >How about this? > >Write code for every one of the 256 possible opcodes, and make sure that >it follows these important considerations: > >1) The code is responsible for updating all registers, the 6502PC, > and also for checking special functions if a hardware location > is called. >2) It should be 256 bytes per opcode or less. >3) It should jump back to STLP (see below) when done. >4) It should intelligently introduce timing delays (very hard) >5) Something else I've forgotten. > >Take each piece of code, and distribute it in a 64K memory area, so that >the code for opcode N starts at base + (n << 8). There may be a lot of >wasted space, but on any reasonable Amiga, this space can be sacrificed >for speed IMO. > > ; Base reg of instruction code in A0. 6502PC in A1. > >STLP MOVE.B (A1), lcna ; lcn is the upper byte of the > ; dsp displacement below > JMP dsp (A0) > > >Excuse my poor 68K code. I think this will work, but it has been three >years since I even looked at assembler on the 68000. Before anyone >screams about this code failing on a 68030, remember that the >modification made above is only needed until the JMP. After that, we do >not need to see it again. If I read the info on the 68030 cache >properly, this should be okay. At worst, the instruction cache can be >disabled. > >This should be considerably faster than any table driven system. On any >system with multi-byte opcodes (eg. the 8088), this system can be easily >extended, and minor modifications can be made to emulate most modern CPU >with reasonable speed. > >Now a suggestion for dealing with hardware locations (harder on memory >mapped systems than on things like Intel and Zilog chips which have I/O >instructions.) This is another memory-expensive solution, BTW. > >For every location, have two flags, one for read and one for write. If >the opcode handlers write to the location, they can check the write flag >and if it is set, call a handler which determines which location is >being modified, and pass control to a routine to do it. > >Now for the VERY memory expensive solution. > >Have one 16 bit displacements per memory location (ie. for 1M of >simulated RAM you must have 3M of real RAM.) This location contains the >displacement from a base register to a routine which handles that memory >location. If this word is non-zero, an instruction that reads or write >to it calls this routine. You could even have two displacements, one >for read and one for write, but this would only be practical for systems >that have small amounts of simulated memory (eg. C64). > >At a rough estimate, for a C64 emulator (IMO the hardest of all >emulators.) > > Simulated Memory 64K > Read Displacements 128K > Write Displacements 128K > Opcode handlers 64K > Memory Location Handlrs 28K (?) > Rest 100K (??) > ---- > 512K > >So, for an average 1M Amiga, this is quite possible. > >As for the project, why not start with something much simpler? An Atari >2600 emulator. The hardware is quite simple to simulate, though the >timing will have to be *very* precise. This would be a good starting >point. Opinions, anyone? > >-- >Ian Farquhar Phone : + 61 2 805-9400 >Office of Computing Services Fax : + 61 2 805-7433 >Macquarie University NSW 2109 Also : + 61 2 805-7420 >Australia EMail : ifarqhar@suna.mqcc.mq.oz.au > > >From: ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar) >Newsgroups: comp.sys.amiga.emulations >Subject: Re: Emulator Mechanics (sorry long post) >Message-ID: <1304@macuni.mqcc.mq.oz> >Date: 6 Mar 91 13:52:25 GMT >References: <4992@mindlink.UUCP> <1991Mar6.010141.5905@mintaka.lcs.mit.edu> <1303@macuni.mqcc.mq.oz> >Organization: Macquarie University, Sydney, Australia. > >An idea about code compilation, rather than interpretation. > >If you assume that a program does not employ self-modifying code, this >may not be nearly as had as it first appears (I am not saying that it is >trivial, just not too bad.) This assumption can be made if the program >runs out of ROM, and also in 99% of non-ROM cases. > >The problems of compiling code for one CPU into that for another are >fairly easy, but a major problem occurs when table-driven code is >employed (common.) > >What I am suggesting is this. It is reasonably elegant, and quite >simple to perform: > >Store the loaded memory image of the original program as well as the >compiled code. For every instruction in this image, a pointer to the >compiled code is generated indicating where the corresponding code >actually is. As the program executes, the working store is this memory >image, and if a piece of data is read, it can be dealt with normally. >However, if a program attempt to jump, branch or call a location using >data loaded (as would happen with a table), then the corresponding >location can be determined and the correct code called. > >This has a high memory overhead (3-5 times original image size *plus* >compiled code). > >I am sure that there is a more memory-effective way of doing this by >identifying such tables during the compile, but it seems a difficult >problem. Has anyone got any better solutions? > >BTW, an idea for semi-compiled code. Have two branches for every location, >and simulate the machine that way. Reading an opcode calls the >appropriate opcode handler, writing it calls a reevaluation routine that >changes the pointers for the two locations. System and special >locations are handled similarly. This may actually be a quite >reasonable solution, as inelegant as it first sounds. > >-- >Ian Farquhar Phone : + 61 2 805-9400 >Office of Computing Services Fax : + 61 2 805-7433 >Macquarie University NSW 2109 Also : + 61 2 805-7420 >Australia EMail : ifarqhar@suna.mqcc.mq.oz.au > > >From: rjc@geech.ai.mit.edu (Ray Cromwell) >Newsgroups: comp.sys.amiga.emulations >Subject: Emulator Mechanics >Message-ID: <1991Mar7.093149.18707@mintaka.lcs.mit.edu> >Date: 7 Mar 91 09:31:49 GMT >Organization: None > > > I got an idea. How about combining both methods? (compile and interpret) > > First compile the executable with flow analysis into 68k code. Any >instruction that tries to touch a hardware register will become >a trap instruction(or use an MMU to trap it) to emulate the hardware. > > Where does the interpreter come in? Any code that tries to self modify >or do a change of state (table lookup, indirect jumps, etc) with be >interpreted. > > Sounds hard? You bet. The easier solution of all is to throw faster >CPUs at the problem. > > >Now for another question: > > Why doesn't AMAX multitask? It should be EASY to run multiple copies >of the Mac ROM as a task (since its a rom, its mostly pure code, except >each task would need to alter where the ROM things the system memory is.) > > For things like the Atari emulator, or another 68000 emulator, the compile >technique should work great! Just recompile the parts of the code >that bang on the hardware, or use an MMU to trap them. > > >The whole compilation process is made easier when the machine >your trying to emulate has an object code format that contains >seperate data and code chunks, and perhaps some relocation data. >Further more, if the machine has rules against self modifying code, >and a device independent OS it becomes trivial. > >Could someone run SI or dhrystone on IBeM and tell me how it performs? >Itis said that SoftPC on a Mac @25mhz 68030 runs at 6mhz AT speed. > >Does anyone know if IBM code uses self modifying code, or >jump tables? What kind of code does their compilers produce? >And, does the IBM has a special object code format that seperates >code and data? > > >The compilation technique could work well, but the compiler would have >to be VERY smart in its flow analysis of detecting code/data, jump >tables, self modifying code, PC counter and stack manipulation, etc. > >I'll clairfy my thoughts tommorow, it's going on 4:30am here, and I >need to get some well deserved sleep. :-) > > >From: stevek@amiglynx.UUCP (Steve K) >Newsgroups: comp.sys.amiga.emulations >Subject: Re: Emulator Mechanics (sorry long post) >Message-ID: <stevek.3380@amiglynx.UUCP> >Date: 6 Mar 91 23:12:59 GMT >References: <4992@mindlink.UUCP> >Organization: Amiga-Lynx Bbs Multi-Node 201-368-0463 NJ,USA > >The idea of a "translator compiler" is very interesting. I could have sworn I >saw some talk about a IBM->Atari ST translator that exists on FidoNET. But >why is everyone talking about IBM? Why not Macintosh programs? Though I am >not anyting close to a professional programmer, I'd imagine programs that run >off different computers with the same processor would share some op-codes, >right? Even if that is not true, they do both have similar GUIs, windows, >gadgets, and menu bars, ect. which shold make the translation easier and more >like the original. I sincerly hope someone will pick up and persue these >ideas, it would be very benificial for me and others - I'd love to run a nice >pascal compiler like Turbo Pascal, and an excellent ANSI editor like TheDraw! > >- Steve Krulewitz - Fair Lawn, NJ - > > >From: dclemans@mentorg.com (Dave Clemans @ APD x1292) >Newsgroups: comp.sys.amiga.emulations >Subject: Re: Emulator Mechanics (sorry long post) >Message-ID: <1991Mar6.212548.9641@mentorg.com> >Date: 6 Mar 91 21:25:48 GMT >References: <4992@mindlink.UUCP> >Organization: Mentor Graphics Corporation > >On the "compiled" approach to emulation: > >There definitely has been previous work in this area. The >system I've heard of was an internal tool that was used experimentally >to port some games from the Intel world to the 68K world. > >To use it, you basically had to develop enough information to >get a clean disassembly of the Intel code; i.e., so that you "knew" >where all the code and data was in the source object file. >That then was used to drive the tool that produced the "compiled" >68K file. After that was done you had to go over the output >for correctness, system dependencies, etc.; it was not intended >as a turn key system. > >... > >As a side issue, to bring over some of the bigger DOS packages >you'll have to worry about more than just converting a single >file. You have to find and convert all of their code overlay files.... > >dgc > > >From: ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar) >Newsgroups: comp.sys.amiga.emulations >Subject: Re: Emulator Mechanics (sorry long post) >Message-ID: <1312@macuni.mqcc.mq.oz> >Date: 8 Mar 91 03:51:21 GMT >References: <4992@mindlink.UUCP> <1991Mar6.212548.9641@mentorg.com> >Organization: Macquarie University, Sydney, Australia. > >In article <1991Mar6.212548.9641@mentorg.com> dclemans@mentorg.com (Dave Clemans @ APD x1292) writes: >>To use it, you basically had to develop enough information to >>get a clean disassembly of the Intel code; i.e., so that you "knew" >>where all the code and data was in the source object file. >>That then was used to drive the tool that produced the "compiled" >>68K file. After that was done you had to go over the output >>for correctness, system dependencies, etc.; it was not intended >>as a turn key system. > >Well, on a 6502, it is practically impossible to truly decide what is >code and what is data. Let's imagine that you are (as I have done), >writing a 6502 disassembler. For every byte you store a status thst >says DATA, OPCODE, ARGUMENT, and also non-exclusive flags (eg. >BRANCHENTERSHERE, BRANCHISNOTHERE.) Initially, all bytes are set to DATA, >and BRANCHISNOTHERE. > >Write a recursive procedure that starts at the program entry point, and >goes through the code. For every byte read as an opcode, tag it as an >OPCODE, and the bytes following as ARGUMENT. If you get to a branch >instruction, recursively call the new branch point, and continue >processing that until you hit a byte that has already been processed >(ie. not tagged DATA.) You should continue until the whole procedure >exits, then run the same thing on the RESET, INT and NMI vectors. Calls >are treated the same way as branches, except that the routine exits to a >higher level invocation when it hits a RET or RTI. > >Now, you should have all the data tagged as either program (OPCODE and >ARGUMENT), or DATA. Right? Wrong. Why? Because the 6502 has no >branch always instruction, and your program may continue past what >appears to be a conditional branch, into data, and screw everything up >completely. > >I experimented with using a two pass approach to this problem. First, >the program was scanned sequentially, treating every byte as an opcode, >and tagging every point referenced by some branch, call, jump or vector. >Then, when a branch was found during the second recursive pass, the >program would backtrack and examine every last opcode till it hit a >branch in point (after which no assumptions could be made), to see if >the flags were left in a deterministic state. At this point I lost >interest in the whole idea. > >Anyway, on the 6502 and anything without a BRA or equivalent, the >problem of automatically determining what is data and what is code is >extremely difficult. > >However, on the 68K, this approach is probably quite profitable. Why? >Because there is enough correspondence between the 6502 and 68K >instructions sets (both having the same ancestor, the 6800) to mean that >the compilation process is reasonably simple. > >Simulating the hardware is still a problem, and I'll have to give that >one some thought... I still tend to favor the idea that I presented in >a previous article, carrying around a compiled image (of code only), and >uncompiled data with labels to the compiled code and handlers for the >I/O locations. > >-- >Ian Farquhar Phone : + 61 2 805-9400 >Office of Computing Services Fax : + 61 2 805-7433 >Macquarie University NSW 2109 Also : + 61 2 805-7420 >Australia EMail : ifarqhar@suna.mqcc.mq.oz.au > > >From: drysdale@cbmvax.commodore.com (Scott Drysdale) >Newsgroups: comp.sys.amiga.emulations >Subject: Re: Emulator Mechanics (sorry long post) >Message-ID: <19614@cbmvax.commodore.com> >Date: 7 Mar 91 20:15:07 GMT >References: <4992@mindlink.UUCP> <stevek.3380@amiglynx.UUCP> >Reply-To: drysdale@cbmvax.commodore.com (Scott Drysdale) >Organization: Commodore, West Chester, PA > >In article <stevek.3380@amiglynx.UUCP> stevek@amiglynx.UUCP (Steve K) writes: >>The idea of a "translator compiler" is very interesting. I could have sworn I >>saw some talk about a IBM->Atari ST translator that exists on FidoNET. But >>why is everyone talking about IBM? Why not Macintosh programs? Though I am > >IBM to Atari ST ports should be relatively simple. the ST essentially runs >CP/M 68K with several extensions, much like ms-dos. so low level calls >will pretty much translate directly, and you only have to worry about access >to hardware. > >>- Steve Krulewitz - Fair Lawn, NJ - > > --Scotty >-- >=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >Scott Drysdale Software Engineer >Commodore Amiga Inc. UUCP {allegra|burdvax|rutgers|ihnp4}!cbmvax!drysdale > PHONE - yes. >"Have you hugged your hog today?" >=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > > >From: cg@ami-cg.UUCP (Chris Gray) >Newsgroups: comp.sys.amiga.emulations >Subject: Re: Emulator Mechanics (sorry long post) >Message-ID: <cg.7040@ami-cg.UUCP> >Date: 9 Mar 91 08:11:28 GMT >References: <4992@mindlink.UUCP> <1991Mar6.212548.9641@mentorg.com> <1312@macuni.mqcc.mq.oz> >Organization: Not an Organization > >In article <1312@macuni.mqcc.mq.oz> ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar) >writes: > >[Discussion of using a recursive approach to run through all branches and >calls to tag bytes as being code/data.] > >>However, on the 68K, this approach is probably quite profitable. Why? >>Because there is enough correspondence between the 6502 and 68K >>instructions sets (both having the same ancestor, the 6800) to mean that >>the compilation process is reasonably simple. > >My Amiga disassembler, Dis (which is available somewhere or other; I believe >I sent it out last year some time) does this. The thing it has troubles >with are things like function variables. In writing disk-resident libraries, >a lot of that is done, so it doesn't work too well on such things. It has >special code to handle case/switch statements generated by several compilers, >but an assembler program doing a branch table will usually mess it up. I >don't think there is much hope of an automated translater. Perhaps some >kind of interactive one, so that the user can aid in identifying what is >code and what is data. > >-- >Chris Gray alberta!ami-cg!cg or cg%ami-cg@scapa.cs.UAlberta.CA > > >From: finkel@TAURUS.BITNET >Newsgroups: comp.sys.amiga.emulations >Subject: Re: Emulator Mechanics (sorry long post) >Message-ID: <2406@taurus.BITNET> >Date: 9 Mar 91 21:26:03 GMT >References: <4992@mindlink.UUCP> <1991Mar6.004247.8964@cpsc.ucalgary.ca> >Reply-To: finkel%math.tau.ac.il@CUNYVM.CUNY.EDU (Udi Finkelstein) >Organization: Tel-Aviv Univesity Math and CS school, Israel > >as someone who wrote a 8085 emulator/disassembler/monitor for the C64 >I would like to contribute my on thoughts on the subject. > >I have toyed the idea of writing an IBM PC program translator that >would take an IBM program and 'translate' it to run on the amiga, but >after careful examination of the idea I decided to drop it for a few >reasons: > >1. There is no way to find out at compile time which memory references >are accessing special RAM areas such as the text/graphics video screen. > >2. self modifying code breaks such schemes easily > >3. code/data seperation can be tough. for example, it's very hard to detect >if memory block contains code ( translate it) , data ( don't touch it) >or worse - near or far pointers. > >(2) is rare, but (1) and (3) are common, so I guess many programs will >break. > > >even commercial systems claiming the ability to 'compile' PC binaries >into UNIX programs such as XDOS (anyone heard of them lately??) aren't >automatic. > >Instead, I decided to try concentating on the 'usual' type of emulators >such as IBeM and Transofrmer. > >What caught my attention is that a large portion of the time an emulator >is spending while emulating an instruction is to check whether a memory >read/write accesses video memory. every address being written to memory must >be checked whether it lies in the $BXXXXX range, and if it does, it should >be written to the screen. > >What I really wanted to do if I had an MMU based machine is to write an >emulator that will use the MMU to track such memory accesses. The emulator's >memory will be referenced without translation, but every address in the range >where the video memory is located will be caught by the MMU and special code >will be run to handle it. This would speed things up. > >Udi > > >From: ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar) >Newsgroups: comp.sys.amiga.emulations >Subject: Re: Emulator Mechanics (sorry long post) >Message-ID: <1323@macuni.mqcc.mq.oz> >Date: 10 Mar 91 03:33:49 GMT >References: <4992@mindlink.UUCP> <1991Mar6.004247.8964@cpsc.ucalgary.ca> <2406@taurus.BITNET> >Organization: Macquarie University, Sydney, Australia. > >In article <2406@taurus.BITNET> finkel%math.tau.ac.il@CUNYVM.CUNY.EDU (Udi Finkelstein) writes: >>as someone who wrote a 8085 emulator/disassembler/monitor for the C64 >>I would like to contribute my on thoughts on the subject. > >A 8085 toolkit on a C64? The mind boggles. Why?! :-) > >>2. self modifying code breaks such schemes easily > >I have yet to see a piece of self-modifying code on an IBM PC, and it >will cause problems on the many 386DX and 486 systems that have instruction >caches without write-through. > >>What I really wanted to do if I had an MMU based machine is to write an >>emulator that will use the MMU to track such memory accesses. The emulator's >>memory will be referenced without translation, but every address in the range >>where the video memory is located will be caught by the MMU and special code >>will be run to handle it. This would speed things up. > >Check my article on using lots of memory to very quickly tag this >occurence on non-PMMU systems, by vectoring every location to a >handler. > >-- >Ian Farquhar Phone : + 61 2 805-9400 >Office of Computing Services Fax : + 61 2 805-7433 >Macquarie University NSW 2109 Also : + 61 2 805-7420 >Australia EMail : ifarqhar@suna.mqcc.mq.oz.au > > >From: ecarroll@maths.tcd.ie (Eddy Carroll) >Newsgroups: comp.sys.amiga.emulations >Subject: Re: Emulator Mechanics (sorry long post) >Message-ID: <1991Mar9.225807.26560@maths.tcd.ie> >Date: 9 Mar 91 22:58:07 GMT >References: <4992@mindlink.UUCP> >Organization: Dept. of Maths, Trinity College, Dublin, Ireland. > >In article <4992@mindlink.UUCP> Chris_Johnsen@mindlink.UUCP (Chris Johnsen) >writes: >> >> [ Interesting discussion about a new sort of PC emulator ] > >My final year project for college last year tackled exactly this problem, i.e. >given a PC executable file, produce an Amiga executable file which will >perform the same function. The end result worked, in a limited sort of way; >it was fairly easy for it to be misled by any program of reasonable size. > >There are a few tricky problems that must be handled if you want to achieve >a good degree of success: > > o Given the instruction MOV reg,VAL do you treat VAL as a constant value > or as an offset to an address in one of the PC's segments? The 8086 > uses the same opcode for both cases, and it is far from easy to work out > which meaning to use. It's important to get it right because in the > translated program, all the constant values should remain the same > but address offsets will be different. > > o How do you handle jump tables? Most compilers implement switch/case > statements with jump tables, but it is not always clear what the bounds > of such tables are. > > o On the PC, byte values can be pushed onto the stack as byte values. On > the 68000, you can only push word values. You can see why this might be > a problem if you consider a function accessing a parameter at an offset > of 8 from the stack pointer. If the preceding parameters were four words, > then this is okay. If they were two words and four bytes however, then > the offset on the 68000 needs to be 12 to access the same data. > > o Given that most major PC programs access the PC's video RAM directly > for speed reasons, such accesses must be mapped to act equivalently > on the Amiga. The problem is that the usual method for doing this is to > set a segment register to point to the video RAM and then to access the > RAM through this. The same register is likely used for for all sorts of > other data access elsewhere in the program. > >Of course, there are other problems too. The main thing I discovered during >the course of the project was that it's a lot harder to do translation than >it might first seem. > >My program performs the translation in two passes. The first pass traces out >all possible paths through the PC code, building up a symbol table and >separating code from data as it goes. The second pass then walks sequentially >through the code, producing 68K assembly source code for each line. Certain >instructions generate macros instead of opcodes, which are later expanded >to calls to library functions. The resulting source is fed into Charlie's A68K >and then linked with a library that sets up a PC-like environment (custom >screen, ibm.font etc.) and handles the INT calls. The result is a standalone >executable file. > >If anyone else has a shot at it, I'd be interested in seeing how it turns out. >Perhaps a more practical route to take is to produce a translator that can >take reasonably well written PC source code (say, C or assembly) and >automatically convert it to Amiga source code. If such a tool existed and >worked well, it might encourage more companies to port their products to the >Amiga. > >In the meantime, back to IBem. >-- >Eddy Carroll ----* Genuine MUD Wizard | "You haven't lived until >ADSPnet: cbmuk!cbmuka!quartz!ecarroll | you've died in MUD!" >Internet: ecarroll@maths.tcd.ie | -- Richard Bartle > > >From: srwmpnm@windy.dsir.govt.nz >Newsgroups: comp.sys.amiga.emulations >Subject: 8 methods to emulate a Z80 >Summary: 8 methods to emulate a Z80 on a 68000 >Keywords: Z80 68000 Amiga >Message-ID: <18847.27d80900@windy.dsir.govt.nz> >Date: 8 Mar 91 21:58:24 GMT >Organization: DSIR, Wellington, New Zealand > >Ok folks, here are 8 methods for doing z80 emulation on a 68000, in software. >(Well, 8 methods to get to decode a z80 instruction and get to the right >emulation routine, anyway.) > >Trade-offs are speed, space and cleanliness. They all fall short of >"compiling and optimising", but most of these methods will speed up most >existing emulators. As you might expect, the largest and dirtiest code is >usually the fastest (and least portable). The same methods should work with >emulation of 6502, PDP-11 and any other 16-bit processors. > >In all methods, I assume there is a 64kb block of memory representing the z80's >address space, allocated by AllocMem (say), and pointed to by "z80ram". > >------------------------------------------------------------------------------- >Method 1: The "standard" method: > >I call this method "standard" because it's used in both of the CP/M z80 >emulators I know about. The general idea is to decode the current instruction >and jump to the appropriate emulation routine via a vector table. That is, >like a CASE statement with 256 selections. The code is clean and re-entrant. > >; Setup > move.l z80ram,a2 ; load pseudopc > lea.l optabl(pc),a1 ; a1 always points to optabl > lea.l mloop(pc),a3 ; a3 always points to mloop > >; Main loop (decode) starts here >mloop: moveq #0,d0 ; 4 Execute appropriate subroutine. > move.b (a2)+,d0 ; 8 Grab the next opcode and inc pc. > asl #2,d0 ;10 D0 high word is still zero! > move.l 0(a1,d0.w),a0 ;18 Get address of routine from table > jmp (a0) ; 8 Do the subroutine. > ;48 total cycles to decode > even >optabl: dc.l nop00,lxib,staxb,inxb,inrb,dcrb,mvib,rlc > dc.l ... > >Each z80 instruction emulation routine ends with: > > jmp (a3) > >------------------------------------------------------------------------------- >Method 2: The "position-independent" method: > >This is slightly quicker, the executable is more than 1500 bytes smaller, and >you get another register to play with in the emulator (a1 in this case). I >currently use this method (or close to it) in my Spectrum emulator. The code >is clean and re-entrant. > > move.l z80ram,a2 ; load pseudopc > lea.l mloop(pc),a3 ; a3 always points to mloop >mloop: moveq #0,d0 ; 4 clear opcode word > move.b (a2)+,d0 ; 8 get opcode byte > add.w d0,d0 ; 4 2 bytes per entry > move.w optabl(pc,d0.w),d0 ;14 get offset of routine > jmp optabl(pc,d0.w) ;14 do instruction > ;44 total to decode > even >optabl: dc.w nop00-optabl,lxib-optabl,staxb-optabl,inxb-optabl > dc.w inrb-optabl,dcrb-optabl,mvib-optabl,rlc-optabl > dc.w ... > >Each instruction emulation routine ends with: > > jmp (a3) > >------------------------------------------------------------------------------- >Method 3: The "decode-at-end-of-instruction" method: > >(There are really 2 methods described here.) Take either method 1 or method 2. >Instead of ending each emulation routine with "jmp (a3)", end each one with a >complete copy of the code from mloop to the indirect jmp. There is no longer >a main loop, because each instruction jumps directly to the next one. > >This method is slightly faster, takes maybe twice as much code, is clean, and >is re-entrant. It also saves yet another reserved register, in this case a3. >(Personally, I find that a z80 emulator needs as many free registers as you >can get your fingers on.) > >------------------------------------------------------------------------------- >Method 4: The "threaded jsr's" method: > >Warning: This method uses self-modifying, non-re-entrant code, and therefore >is not recommended. This code is hazardous to your cache! (No flames please >--- read on). > >Introduce a 390kb contiguous block of code (called thread) which looks like >this: > >thread: jsr patch ; 0 > jsr patch ; 1 > ... > jsr patch ; 65535 > jmp thread > >That is, there is a jsr instruction for each byte in the z80's address space. >This is in addition to z80ram. > >To start the emulator, you transfer control to "thread". What the "patch" >routine does is to replace the current "jsr patch" with "jsr this_routine", >where this_routine is the emulation routine for the corresponding opcode in >z80ram. Then patch jmps to the this_routine to execute the instruction and to >return to the next jsr in the thread. After a while, patch will no longer be >called (except by z80 self modifying code), and every jsr made will be to >emulate a z80 opcode directly. > >Whenever a z80 instruction writes to RAM, it patches the corresponding >"jsr this_routine" with "jsr patch". As a variation, it could patch >"jsr this_routine" with "jsr new_routine", but that would probably be slower >in general. > >Advantage: > >It would be faster than methods 1 to 3, --- I think, --- especially in the >Spectrum emulator, which has to do a lot of work with every write to RAM to >check for ROM and video RAM anyway. The main reason for the extra speed is >that it no longer has to decode the opcode on every instruction. There are >the extra overheads of call and return though, and extra work to do on every >RAM write. > >Disadvantages: > >1: The code breaks C='s self-modifying code law. To run on Amiga's with >caches, it would have to either disable the caches or update them manually >after every patch. The code is extremely dirty, not re-entrant, and >definitely not recommended; > >2: You need 390k contiguous memory (plus another 64k somewhere else, plus >whatever else you need for video). > >Other characteristics: > >Code would run slowly the first time round the loop, then speed up. > >-------------------------------------------------------------------------- >Method 5: The "replicated code" method. > >Warning: This also uses self-modifying, non-re-entrant code and is therefore >not recommended. > >Thread consists of 65536 blocks of code, each long enough to emulate the >trickiest z80 instruction. Initially it contains 65536 copies of patch. (You >will need A LOT of contiguous memory.) What patch does is to actually copy >the code for the opcode over itself, then transfer control to the beginning of >itself. (Tricky, but it can be done.) Every emulation routine finishes with >a "bra.s next_instr" so they are all really the same length. That saves the >call and return overhead. > >If an emulation routine is too long, then just use a jmp to somewhere. In >practice, you would probably start with: > > jsr patch > bra.s next_instr > >in every slot, rather than a complete copy of patch. Z80 RAM writes would >copy the above code to the corresponding slot, if necessary, rather than >copying the whole patch routine. > >Short of "compiling and optimising", this is the fastest method I can think of, >but it is incredibly space-wasting, self-modifying, extremely dirty, and >definitely not recommended. > >-------------------------------------------------------------------------- >Method 6: The "threaded vector table" method: > >Ok, now to fix the self-modifying code problem. Take method 4 (threaded jsr's), >but use a 262kb vector table in a private data segment, instead of a thread in >the code segment. > >vectors: dc.l patch ; 0 > dc.l patch ; 1 > ... > dc.l patch ; 65535 > dc.l jmp_thread > >The main instruction loop looks like: > > lea.l vectors,a0 > lea.l mloop(pc),a2 >mloop: move.l (a0)+,a1 ;12 cycles > jmp (a1) ; 8 cycles > >and every instruction finishes with "jmp (a2)". A0 is acting as a "pseudo-pc" >into the vector table. Of course patch performs the same functions as before >(except it is no longer self modifying, it just patches a vector). The vector >table still needs to be updated by every write to Z80 RAM. The code is >re-entrant provided each task has a separate copy of the vector table. > >-------------------------------------------------------------------------- >Method 7: The "position-independent threaded vector table" method: > >Same as method 6, except that now the private data segment is: > >thread: dc.w patch-base ; 0 > dc.w patch-base ; 1 > ... > dc.w patch-base ; 65535 > dc.w jmp_thread-base > >and the main loop is: > > lea.l thread,a0 > lea.l mloop(pc),a1 >mloop: move.w (a0)+,d0 ; 8 cycles > jmp base(pc,d0.w) ;14 cycles >base: >patch: ... >op00: ... >op01: ... >jmp_thread: ... > >Now it is position-independent, only 128kb contiguous memory, the executable >is 1500 bytes smaller, and it is slightly slower (only by 2 cycles per z80 >instruction though). The code is re-entrant provided each task has a separate >copy of the vector table. > >-------------------------------------------------------------------------- >Method 8: The "decode-at-end-of-instruction threaded vector table" method: > >Same as method 6 except that every opcode emulation routine finishes with: > > move.l (a0)+,a1 > jmp (a1) > >instead of "jmp (a2)". Now isn't that faster? And it saves a2 for more >important things. > >Unfortunately you can't do exactly the same thing to method 7 unless you can >write a complete z80 emulator in 256 bytes 8-) . But you could take method 7 >and end each emulation routine with: > >mloop: move.w (a0)+,d0 > lea.l base(pc),a1 > jmp 0(a1,d0.w) > >instead. The code is re-entrant provided each task has a separate copy of >the vector table. > >-------------------------------------------------------------------------- >Personally I'm considering using one of the methods 6, 7 or 8 in the next >version of the Spectrum emulator (probably method 8) (That is, if I ever get >enough spare time without more interesting things to do.) I'll probably make >the source public domain. That will use more Amiga RAM, but should go faster >(I hope). Any guesses as to which method will be the fastest, and still fit >comfortably in a 512k machine? > >Unfortunately I don't think any of the methods (except the first 3) are >suitable for an 8088 emulator because of the huge memory requirements. > >I'm interested in any ideas anyone might have along these lines. The >discussion of "compiling and optimising" is very interesting, but I don't see >how the details would work. In particular, how do you cope with self-modifying >code, code loaders, overlays etc? > > >Peter McGavin. (srwmpnm@wnv.dsir.govt.nz) > >Disclaimer: I haven't tested any of the above ideas (except 1 and 2). If you >see any bugs, point them out. > > >From: daveh@cbmvax.commodore.com (Dave Haynie) >Newsgroups: comp.sys.amiga.emulations >Subject: Re: Emulator Mechanics >Message-ID: <19749@cbmvax.commodore.com> >Date: 11 Mar 91 20:25:55 GMT >References: <1991Mar7.093149.18707@mintaka.lcs.mit.edu> >Reply-To: daveh@cbmvax.commodore.com (Dave Haynie) >Organization: Commodore, West Chester, PA > >In article <1991Mar7.093149.18707@mintaka.lcs.mit.edu> rjc@geech.ai.mit.edu (Ray Cromwell) writes: > >> Why doesn't AMAX multitask? It should be EASY to run multiple copies >>of the Mac ROM as a task (since its a rom, its mostly pure code, except >>each task would need to alter where the ROM things the system memory is.) > >Without massive patching of the Mac ROM, I don't think so. The same reason >you don't have more sophisticated multitasking on the Mac itself. You can't >run multiple copies of the same Mac ROM code, since the code is not reentrant >like Amiga code. Without some clever MMU tricks, I don't think you could >easily relocate things such that several different copies of the ROM code >could coexist. At least, you would expect Apple to have considered any of >the more mundane tricks available on any 680x0 system. Apple does seem to have] >decent technical folks, I doubt they missed any easy tricks... > >Via MMU, you could certainly get further. Multifinder didn't do that, since >it would have required an MMU based system for any multitasking. I'm surprised >Apple didn't work out anything like that for their MacOS-under-UNIX, though. > >>Does anyone know if IBM code uses self modifying code, or >>jump tables? > >Self modifying code and other uglies are very prevalent in MS-DOS code. That's >the main reason Intel went for a unified code/data cache for the 80486. >Separate I and D caches with internal Harvard architecture (like the '030 and >'040) can yield a much faster system, all else being equal. But that extra >performance would not have been worth tossing out what was apparently a large >portion of the MS-DOS programs out there. > >MS-DOS programs, more and more often, think they're the operating system. >Every program is responsible for managing differences between CPUs, managing >memory (beyond the 8088 model), graphics, etc. > >-- >Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests" > {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy > "What works for me might work for you" -Jimmy Buffett > > >From: srwmpnm@windy.dsir.govt.nz >Newsgroups: comp.sys.amiga.emulations >Subject: Re: 8 methods to emulate a Z80 >Summary: 4 more methods to emulate a Z80 >Keywords: Z80 68000 Amiga Emulator >Message-ID: <18850.27dbf20e@windy.dsir.govt.nz> >Date: 11 Mar 91 21:09:34 GMT >References: <18847.27d80900@windy.dsir.govt.nz> >Organization: DSIR, Wellington, New Zealand > >Here are 4 more methods for doing z80 instruction decoding on a 68000. These >methods are all based on fixed-size instruction emulation routines. > >There are some general comments, on handling multiple-byte opcodes, writing to >hardware registers, and flag handling, at the end of this post. > >These 4 methods are faster than methods 1..3 (see previous post), they do not >have the RAM-write overheads of methods 4..8, and they do not require tables of >opcode routine address offsets. Some of these methods might be superior to all >previous methods. > >In all these methods, each z80 instruction emulation routine is fixed at 256 >bytes in length. (See earlier post by Ian Farquhar.) They are coded in the >sequence op_80, op_81, ... op_ff, op_00, op_01, ... op_7f. So there is exactly >64k of code. If a routine is shorter than 256 bytes, the extra space is >wasted. If a routine is longer than 256 bytes then it will need a jsr to >somewhere. > >------------------------------------------------------------------------------- >Method 9: The "self-modifying fixed-size routine" method: > >Warning: Self modifying code follows. > >This method is almost the same as in Ian Farquhar's earlier c.s.a.e post. > >setup: lea.l op_00(pc),a1 ; a1 always points to op_00 > move.l z80ram,a2 ; load pseudopc > >mloop: move.b (a2)+,1$-op_00+4(a1) ;16 patch $ff in jmp instruction, inc pc >1$: jmp $ff00(a1) ;10 Jump to routine. > ;26 total cycles > >Every instruction emulation routine ends with a copy of mloop, rather than a >jump back to mloop. > >The move.b patches the high byte of the offset in the second instruction, so >the jump goes to the right routine. (1$-op_00+4 might be 1$-op_00+2 --- I >haven't checked.) > >The code is extremely fast (maybe the fastest yet), but it does not work on >Amigas with memory caches. > >I thought of making it even faster by permanently setting a4 to the address of >the byte to patch, and then using: > >mloop: move.b (a2)+,(a4) ;12 Patch $ff in jmp instruction, inc pc > jmp $ff00(a1) ;10 Jump to routine. > ;22 total cycles > >but you run out of reserved registers if there are lots of copies of mloop. >(I.e, you can't use the decode-at-end-of-instruction technique.) Using >"jmp (a3)" at the end of every routine makes it slower. > >------------------------------------------------------------------------------- >Method 10: The "standard fixed-size routine" method: > >Now we eliminate self-modifying code. > >The main loop is coded into the wasted space at the end of op_ff (so that it is >within 128 bytes of op_00 --- Remember that op_00 is in the middle of the 64k >code block). > >setup: subq.l #2,sp ; make room for scratch area on stack > clr.w (sp) ; low byte of scratch area is always 0 > lea.l mloop(pc),a3 ; a3 always points to mloop > move.l z80ram,a2 ; load pseudopc > >mloop: move.b (a2)+,(sp) ;12 Opcode to scratch high byte, inc pc > move.w (sp),d0 ; 8 high byte is opcode, low byte is 0 > jmp op_00(pc,d0.w) ;14 Jump to routine. > >Each z80 instruction emulation routine ends with: > > jmp (a3) ;10 > ;44 total cycles > >Unfortunately it's quite a bit slower. We can do better... > >------------------------------------------------------------------------------- >Method 11: The "decode-at-end-of-instruction fixed-size routine" method: > >Register a3 always points to op_00 instead of to mloop, and we have: > >setup: subq.l #2,sp ; make room for scratch area on stack > clr.w (sp) ; low byte of scratch area is always 0 > lea.l op_00(pc),a3 ; a3 always points to op_00 > move.l z80ram,a2 ; load pseudopc > >mloop: move.b (a2)+,(sp) ;12 Opcode to scratch high byte, inc pc > move.w (sp),d0 ; 8 high byte is opcode, low byte is 0 > jmp 0(a3,d0.w) ;14 Jump to routine. > ;34 total cycles > >Each z80 instruction emulation routine ends with a copy of the decode routine. >This is faster than method 10, and mloop can be coded anywhere. > >Can we avoid using scratch memory and still be as fast? Think about how you >might do this before you read on. An 8-bit shift of register d0 avoids using >scratch memory, but is slower (on a plain 68000). The next method shows how to >make the decode faster and avoid using scratch memory, but it (possibly) >introduces overhead elsewhere. > >------------------------------------------------------------------------------- >Method 12: The "stretched-address-space fixed-size-routine" method: > >This method assumes that the z80 address space (z80ram) is stretched to 128k, >so that each byte in the z80's address space takes up a word in the Amiga. >The low order byte of every word must always be 0. > >setup: move.l z80ram,a2 ; load pseudopc > lea.l op_00(pc),a3 ; a3 always points to op_00 > >mloop: move.w (a2)+,d0 ; 8 Opcode to d0 high byte, inc pc > jmp 0(a3,d0.w) ;14 Jump to routine. > ;22 total cycles > >For best results, every routine ends with the mloop code (decode at end of >instruction). The instruction decode is faster than method 11, but now many >instructions will have extra work to do to convert byte z80 addresses to word >amiga addresses. Still, this code looks good enough to try. > >Miscellaneous hint #9: To convert a byte offset to a word offset, use >"add.w d0,d0", not "lsl.w #1,d0". > >Another miscellaneous hint: Maybe there's a use for movep here. > >You could maintain 2 copies of the z80 address space --- one in 64k and the >other in 128k. Then it's just a simple matter of writing a byte to both places >whenever the z80 does a write. That gets rid of the overhead of converting >between offset types on memory reads. > >But now our method is starting to look like threaded code (method 8) again. >The threaded code method uses the 128k block to store the offset to the >handling routine, rather than storing the opcode itself. The overhead in doing >a memory write is the same in both methods, and threaded code has other >advantages (like not having to pad code to 256 bytes, multiple-byte opcodes >handled better). So we're back to threaded code again. > >------------------------------------------------------------------------------- >A note on multiple-byte opcode instructions: > >The z80 uses these. They are prefixed with $cb, $dd, $ed and $fd. There are >also $ddcb and $fdcb prefixed instructions. > >For fixed-size methods, (methods 9..12), the fastest way to cope with long >instructions is to use more tables. But that means multiplying the number of >tables by 5 or 7. 64k of code has just jumped to 320k or 448k. That's no good >on a small machine. Also, if you reserve a register to point to op_00 in each >table, that's 5 or 7 registers gone. Oops. > >------------------------------------------------------------------------------- >Some notes on threaded code: > >I spent last evening trying threaded code (method 8) in the Spectrum emulator. >(See previous post.) I got a 20..40% speed improvement over the position- >independent standard method (256-way CASE statement). It's still several times >slower than a real Spectrum, unfortunately. There is some slow code in places >where there wasn't before, so there is scope for more improvement. >Unfortunately I introduced some bugs during the systematic changes, and they >are proving hard to track down. Everything in the Spectrum ROM seems to work >ok. Some machine-code programs that worked before have stopped working. > >Threaded code is extremely fast for multiple-opcode instructions. Control is >vectored directly to the right routine first time, without having to decode >multiple tables. A problem with this is that if a Spectrum program overwrites >the second opcode of a multiple-byte instruction ($cb, $dd, $ed, $fd) without >writing to the first byte, then the emulator doesn't cope. > >------------------------------------------------------------------------------- >Note on hardware registers: > >Handling writes to hardware registers in the Spectrum isn't really very hard. >The z80 has a separate IO address space with a separate set of instructions for >handling it. (The same is true of the 8088.) The only thing to watch for is >writing to video RAM. It is fixed size and at a fixed place, so it takes 2 >tests. (There doesn't seem to be a faster way of doing a single bit test --- >the video RAM doesn't end on a power-of-2 boundary.) I have a separate task >(sort of) which uses the blitter to keep the screen up-to-date with the video >RAM. So when there is a write to video RAM, the emulator task doesn't have to >do much. It just flags the blitter task "Hey, there's something to update in >character row n, when you wake up". The blitter task doesn't slow the emulator >down much, because it's mostly running on another processor, and it sleeps when >there's nothing to update. > >------------------------------------------------------------------------------- >Some notes on flag handling: > >Both of the CP/M emulators I know about spend a lot of time handling z80 flags >(condition codes). After just about every instruction they do a "move sr,d0" >or "move ccr,d0" or call GetCC() to get the 68000 flags, then they do a table >lookup to translate them to z80 format. After every logical instruction (not, >or, xor etc), a second table lookup is done to set the z80 parity flag. (The >68000 does not have a parity flag.) These table lookups are slow. In fact, >they often take several times as long as the guts of the instruction itself. > >Both these table lookups are totally unnecessary! It's faster to save the >flags in 68000 format (in a register). Routines that test flags simply test >the corresponding 68000 flag. For logical instructions, simply save the parity >byte away somewhere, and set another bit in the register to say to use the >parity byte and not v flag. The parity testing instructions (e.g, "jp po,nn") >look at that bit, and then test either the v flag or the saved parity byte. >The only times you need to translate flags between z80 and 68000 formats are in >"push af" and "pop af" instructions. > >I got a 10..20% speedup in my Spectrum emulator this way. > >------------------------------------------------------------------------------- >I said in my previous post that threaded code for 8088 emulation would use too >much memory to be practical. In fact it would be perfectly practical on an >Amiga equipped with 3 Mbytes or more. > >Peter McGavin. (srwmpnm@wnv.dsir.govt.nz) > > >From: Chris_Johnsen@mindlink.UUCP (Chris Johnsen) >Newsgroups: comp.sys.amiga.emulations >Subject: Emulator Mechanics [Transpiler] >Message-ID: <5097@mindlink.UUCP> >Date: 10 Mar 91 17:54:43 GMT >Organization: MIND LINK! - British Columbia, Canada > > > Thank you all for the valued input into this discussion to date, both >pro and con. I must say I find it very stimulating. > > When discussing this form platform porting translator/compiler with >anyone, I find the need for a short word to use. In a humble attempt to coin >a descriptive phrase, may I suggest transpiler? >Charlie Gibbs (Charlie_Gibbs@mindlink.UUCP) Jean-Noel Moyne (jnmoyne@lbl.gov) >Dave Clemans (dclemans@mentorg.com) Dwight Hubbard (uunet.uu.net!easy!lron) >Pete Ashdown (pashdown@javelin.es.com) Jyrki Kuoppala (jkp@cs.HUT.FI) confirm >that, indeed some research and even program development has been done in this >direction. > >Eddy Carroll (ecarroll@maths.tcd.ie) has written such a transpiler, as a last >year project, with some success, some reservations. > >Chris Gray (cg@ami-cg.UUCP) suggests that the most practicable route to >desinging a viable transpiler would be to make it interactive. BTW Chris, I >very much enjoyed your compiler articles in the Amiga Transactor. > >Ray Cromwell (rjc@pogo.ai.mit.edu) suggested an interesting thought, a sort of >Usenetware combined effort for development, he also thinks reasonable execution >speed can be achieved. > > Those are what I percieve to be the ideas supporting the transpiler >concept. The statements of contrary considerations are more voluminous. These >appear to fall into a number of categories. > o Self-modifying code > o Separating code from data > o Determining video access > o Stack handling > o Jump table problems > o Handling overlay segments >Ian Farquhar (ifarqhar@sunb.mqcc.mq.oz.au) suggested that a compiled module be >run concurrent with an emulator type section so that in some parallel way, any >references within the source executable, which would also be loaded during >runtime of the emulation, could be validated. It could be argued that this >should be placed on the pro side of the ledger, but the incurred overhead >during execution would be large. This function, I had imagined initially, >would be carried out during the transpiler phase and not attached to, or >burdening, the runtime execution. > >Brad Pepers (pepers@enme1.ucalgary.ca) Jyrki Kuoppala (jkp@cs.HUT.FI) Jonathan >David Abbey (jonabbey@cs.utexas.edu) were somewhat concerned with >self-modifying code. There were a significant number of voices that dismissed >this concern. Personally, I wouldn't worry about it. If a particular program >used this technique, for whatever reason, I would accept the fact that not all >programs can be transpiled. > >Ian Farquhar (ifarqhar@sunb.mqcc.mq.oz.au) Dave Clemans (dclemans@mentorg.com) >Sullivan (Radagast@cup.portal.com) Jean-Noel Moyne (jnmoyne@lbl.gov) Chris >Gray (cg@ami-cg.UUCP) raised concerns about determining code from bytes of the >data persuation. This is what I had thought would be the biggest stumbling >block. I hadn't thought there would be so many others. :-) It's great to hear >how how long the road is before you begin the journey. > >Udi Finkel (finkel@TAURUS.BITNET) Eddy Carroll (ecarroll@maths.tcd.ie) are >concerned about determining when the access to memory is video ram. This, I >believe, can be solved, if it is dealt with in the same manner as the code/data >resolution. Please read on. > >Eddy Carroll (ecarroll@maths.tcd.ie) points out problems with processor stack >handling. How did you resolve this with the transpiler you developed? It >occurs to me that every emulator whether it be a transpiler, or interpretive >type must handle the stack properly. To me, it would be more difficult to >approach this problem if I were writting an emulator, as opposed to a >transpiler, as less executable size constraints would inhibit producing an >intelligent stack handler. On the 68000 the stack would be handled as >word/longword only. Any bytewise stack functions within the source executable >would be transpiled into wordwise handling. A problem, but surmountable. > >Chris Gray (cg@ami-cg.UUCP) Eddy Carroll (ecarroll@maths.tcd.ie) raised the >problem of coping with jump or branch tables properly. Weighty point. This I >would describe as a grey area between code and data where it may be actual jump >instructions, or a table of label locations, the code to be emulated >calculating an offset into it. This must be handled in a similar fashion to >the code/data recognition problem. > >Dave Clemans (dclemans@mentorg.com) Kurt Tappe (JKT100@psuvm.psu.edu) Jonathan >David Abbey (jonabbey@cs.utexas.edu) were concerned with the handling of >overlay segments. This bothers me too. With the Amiga, it would be possible >to either use overlays, in the case of very large programs, or convert to >all-in-one programs by consolidating the overlay hunks. The main problem is >determining that overlays are employed, loading them and transpiling. This is >closely related to the code/data determination problem. > >Dwight Hubbard (uunet.uu.net!easy!lron) suggest that this would work, kind of >like DSM. I have DSM and must say it was a model for me when contemplating >this idea. Chris Gray (cg@ami-cg.UUCP) believes that an interactive compiler >holds some promise. In one of my initial messages to Charlie Gibbs I >entertained the reality of having to fall back on human intervention, at the >transpiler output source code level. This would be determined at runtime, >when the emulated program failed to operate correctly or at all. A programmer >would be required to fix the problem. > > What I interpert the huge human parallel processor the net represents, >to be saying, is that human intervention must be employed. My only reservation >in this regard is that there are interperative emulators that run a wide >variety of software on-the-fly. To me, this feat is more difficult than a >transpiler, which can ruminate over the executable for as long as say a >raytracer. > > To incorporate the above concepts, and arrive at a workable resolution, >what is required is an expert system, probably requiring a resident expert (not >software, though brain matter is soft), to resolve the code/data, video access, >jump/branch table and overlay problems. I don't suppose that the average >emulator user is prepared to deal with or even understand the problems we have >been discussing. I don't think it would be worthwhile writing a transpiler >unless it could be made to work in some standalone fashion. Feed it the >source executable and out pops the Amiga version. > > Mind shifting stage left... > > I have a suggestion, since it is not considered practicable to write a >transpiler that will work unassisted, that may resolve the percieved problems >using some of the concepts from this message thread. > > First, an off-the-wall analogy. Here he goes again. :-) > > Consider a desirable program, running on another platform, useful to >an Amiga user, but with a very large dongle attached. This useful program must >be run on the dongle in fact. This is inconvient. What do I do if I loose my >dongle? I propose you think of this as a form of copy protection for a >moment, obstructiing the Amiga owner, who also owns a program that runs on >another platform. Follow me so far? The legitimate program for the other >platform cannot be used on the favoured, Amiga machine, so it needs to be >decopy-protected. What do you use if you wish to backup and/or deprotect >software? A copier program. Most copiers can be employed by users who have >little knowledge of copy protection, yet they succeed in making the copy. The >more difficult protection schemes require "brain files" which are written by >experts to achieve this end. End of transpiler/copy-protection analogy. > > Take a basic transpiler, such as the one Eddy Carroll wrote, and add a >toolkit. The transpiler would do its best to translate and compile the source >executable. If this failed, an expert, not just the average user, would either >run the transpiler in expert mode, or a debug tool from the toolkit, which >would work through any problem areas. The result of this would be an expert >transpiler file. The beauty of this approach is that any user could run the >transpiler, given access to the expert file. These expert files could be >included with the release package and any new ones could be included in any >updates or shared by conventional electronic means as PD. > >What do you think? > >csj > > No really officer, I wasn't speeding, just keeping a safe distance in front of >the car behind me! > >Usenet: a542@mindlink.UUCP Phone: (604)853-5426 FAX: (604)854-8104 > > >From: Chris_Johnsen@mindlink.UUCP (Chris Johnsen) >Newsgroups: comp.sys.amiga.emulations >Subject: Emulator Mechanics [Transpiler] >Message-ID: <5097@mindlink.UUCP> >Date: 10 Mar 91 17:54:43 GMT >Organization: MIND LINK! - British Columbia, Canada > > > Thank you all for the valued input into this discussion to date, both >pro and con. I must say I find it very stimulating. > > When discussing this form platform porting translator/compiler with >anyone, I find the need for a short word to use. In a humble attempt to coin >a descriptive phrase, may I suggest transpiler? >Charlie Gibbs (Charlie_Gibbs@mindlink.UUCP) Jean-Noel Moyne (jnmoyne@lbl.gov) >Dave Clemans (dclemans@mentorg.com) Dwight Hubbard (uunet.uu.net!easy!lron) >Pete Ashdown (pashdown@javelin.es.com) Jyrki Kuoppala (jkp@cs.HUT.FI) confirm >that, indeed some research and even program development has been done in this >direction. > >Eddy Carroll (ecarroll@maths.tcd.ie) has written such a transpiler, as a last >year project, with some success, some reservations. > >Chris Gray (cg@ami-cg.UUCP) suggests that the most practicable route to >desinging a viable transpiler would be to make it interactive. BTW Chris, I >very much enjoyed your compiler articles in the Amiga Transactor. > >Ray Cromwell (rjc@pogo.ai.mit.edu) suggested an interesting thought, a sort of >Usenetware combined effort for development, he also thinks reasonable execution >speed can be achieved. > > Those are what I percieve to be the ideas supporting the transpiler >concept. The statements of contrary considerations are more voluminous. These >appear to fall into a number of categories. > o Self-modifying code > o Separating code from data > o Determining video access > o Stack handling > o Jump table problems > o Handling overlay segments >Ian Farquhar (ifarqhar@sunb.mqcc.mq.oz.au) suggested that a compiled module be >run concurrent with an emulator type section so that in some parallel way, any >references within the source executable, which would also be loaded during >runtime of the emulation, could be validated. It could be argued that this >should be placed on the pro side of the ledger, but the incurred overhead >during execution would be large. This function, I had imagined initially, >would be carried out during the transpiler phase and not attached to, or >burdening, the runtime execution. > >Brad Pepers (pepers@enme1.ucalgary.ca) Jyrki Kuoppala (jkp@cs.HUT.FI) Jonathan >David Abbey (jonabbey@cs.utexas.edu) were somewhat concerned with >self-modifying code. There were a significant number of voices that dismissed >this concern. Personally, I wouldn't worry about it. If a particular program >used this technique, for whatever reason, I would accept the fact that not all >programs can be transpiled. > >Ian Farquhar (ifarqhar@sunb.mqcc.mq.oz.au) Dave Clemans (dclemans@mentorg.com) >Sullivan (Radagast@cup.portal.com) Jean-Noel Moyne (jnmoyne@lbl.gov) Chris >Gray (cg@ami-cg.UUCP) raised concerns about determining code from bytes of the >data persuation. This is what I had thought would be the biggest stumbling >block. I hadn't thought there would be so many others. :-) It's great to hear >how how long the road is before you begin the journey. > >Udi Finkel (finkel@TAURUS.BITNET) Eddy Carroll (ecarroll@maths.tcd.ie) are >concerned about determining when the access to memory is video ram. This, I >believe, can be solved, if it is dealt with in the same manner as the code/data >resolution. Please read on. > >Eddy Carroll (ecarroll@maths.tcd.ie) points out problems with processor stack >handling. How did you resolve this with the transpiler you developed? It >occurs to me that every emulator whether it be a transpiler, or interpretive >type must handle the stack properly. To me, it would be more difficult to >approach this problem if I were writting an emulator, as opposed to a >transpiler, as less executable size constraints would inhibit producing an >intelligent stack handler. On the 68000 the stack would be handled as >word/longword only. Any bytewise stack functions within the source executable >would be transpiled into wordwise handling. A problem, but surmountable. > >Chris Gray (cg@ami-cg.UUCP) Eddy Carroll (ecarroll@maths.tcd.ie) raised the >problem of coping with jump or branch tables properly. Weighty point. This I >would describe as a grey area between code and data where it may be actual jump >instructions, or a table of label locations, the code to be emulated >calculating an offset into it. This must be handled in a similar fashion to >the code/data recognition problem. > >Dave Clemans (dclemans@mentorg.com) Kurt Tappe (JKT100@psuvm.psu.edu) Jonathan >David Abbey (jonabbey@cs.utexas.edu) were concerned with the handling of >overlay segments. This bothers me too. With the Amiga, it would be possible >to either use overlays, in the case of very large programs, or convert to >all-in-one programs by consolidating the overlay hunks. The main problem is >determining that overlays are employed, loading them and transpiling. This is >closely related to the code/data determination problem. > >Dwight Hubbard (uunet.uu.net!easy!lron) suggest that this would work, kind of >like DSM. I have DSM and must say it was a model for me when contemplating >this idea. Chris Gray (cg@ami-cg.UUCP) believes that an interactive compiler >holds some promise. In one of my initial messages to Charlie Gibbs I >entertained the reality of having to fall back on human intervention, at the >transpiler output source code level. This would be determined at runtime, >when the emulated program failed to operate correctly or at all. A programmer >would be required to fix the problem. > > What I interpert the huge human parallel processor the net represents, >to be saying, is that human intervention must be employed. My only reservation >in this regard is that there are interperative emulators that run a wide >variety of software on-the-fly. To me, this feat is more difficult than a >transpiler, which can ruminate over the executable for as long as say a >raytracer. > > To incorporate the above concepts, and arrive at a workable resolution, >what is required is an expert system, probably requiring a resident expert (not >software, though brain matter is soft), to resolve the code/data, video access, >jump/branch table and overlay problems. I don't suppose that the average >emulator user is prepared to deal with or even understand the problems we have >been discussing. I don't think it would be worthwhile writing a transpiler >unless it could be made to work in some standalone fashion. Feed it the >source executable and out pops the Amiga version. > > Mind shifting stage left... > > I have a suggestion, since it is not considered practicable to write a >transpiler that will work unassisted, that may resolve the percieved problems >using some of the concepts from this message thread. > > First, an off-the-wall analogy. Here he goes again. :-) > > Consider a desirable program, running on another platform, useful to >an Amiga user, but with a very large dongle attached. This useful program must >be run on the dongle in fact. This is inconvient. What do I do if I loose my >dongle? I propose you think of this as a form of copy protection for a >moment, obstructiing the Amiga owner, who also owns a program that runs on >another platform. Follow me so far? The legitimate program for the other >platform cannot be used on the favoured, Amiga machine, so it needs to be >decopy-protected. What do you use if you wish to backup and/or deprotect >software? A copier program. Most copiers can be employed by users who have >little knowledge of copy protection, yet they succeed in making the copy. The >more difficult protection schemes require "brain files" which are written by >experts to achieve this end. End of transpiler/copy-protection analogy. > > Take a basic transpiler, such as the one Eddy Carroll wrote, and add a >toolkit. The transpiler would do its best to translate and compile the source >executable. If this failed, an expert, not just the average user, would either >run the transpiler in expert mode, or a debug tool from the toolkit, which >would work through any problem areas. The result of this would be an expert >transpiler file. The beauty of this approach is that any user could run the >transpiler, given access to the expert file. These expert files could be >included with the release package and any new ones could be included in any >updates or shared by conventional electronic means as PD. > >What do you think? > >csj > > No really officer, I wasn't speeding, just keeping a safe distance in front of >the car behind me! > >Usenet: a542@mindlink.UUCP Phone: (604)853-5426 FAX: (604)854-8104 > > >From: daveh@cbmvax.commodore.com (Dave Haynie) >Newsgroups: comp.sys.amiga.emulations >Subject: Re: Emulator Mechanics >Message-ID: <19792@cbmvax.commodore.com> >Date: 12 Mar 91 22:39:53 GMT >References: <1991Mar7.093149.18707@mintaka.lcs.mit.edu> <19749@cbmvax.commodore.com> <1991Mar12.011418.24768@mintaka.lcs.mit.edu> >Reply-To: daveh@cbmvax.commodore.com (Dave Haynie) >Organization: Commodore, West Chester, PA > >In article <1991Mar12.011418.24768@mintaka.lcs.mit.edu> rjc@geech.ai.mit.edu (Ray Cromwell) writes: >>In article <19749@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes: >>>In article <1991Mar7.093149.18707@mintaka.lcs.mit.edu> rjc@geech.ai.mit.edu (Ray Cromwell) writes: > >>>> Why doesn't AMAX multitask? > >>>Without massive patching of the Mac ROM, I don't think so. > >> Well it's not a heavy loss if you can't make the Mac ROM resident, but >>why does AMAX have to take over the Amiga's operating system? The only >>thing that would make it really diffcult to run MacOS under >>AmigaDOS is if Mac code fiddles with absolute memory locations or the >>OS implements function calls as traps/interupts. > >Well, the Mac OS fiddles with absolute memory locations, and the OS implements >function calls as line-A exceptions. Apparently, all the absolute locations >are in low memory and get swapped as part of the process context when you run >Multifinder. > >>I also wonder why readysoft used a special disformat for Mac disks >>instead of reading Amiga disks. > >No doubt this was to make Mac software work on Amiga disks. The Mac isn't >as nice about filesystem independence as the Amiga is, so you can't really >provide a loadable Mac filesystem kind of thing that maps Mac filespace >into Amiga filespace adequately. So ReadySoft essentially just built a new >device driver, which uses Amiga-readable formats, but Mac filesystem >organization. > >This is apparently why Mac's don't generally talk to MS-DOS disks via an >alternate filesystem like CrossDOS or MSH, but instead use a user program for >the conversion, along the lines of the old PC Utilities. >-- >Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests" > {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy > "What works for me might work for you" -Jimmy Buffett > > >From: ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar) >Newsgroups: comp.sys.amiga.emulations >Subject: Re: Emulator Mechanics [Transpiler] >Message-ID: <1338@macuni.mqcc.mq.oz> >Date: 13 Mar 91 03:24:04 GMT >References: <5097@mindlink.UUCP> >Organization: Macquarie University, Sydney, Australia. > >In article <5097@mindlink.UUCP> Chris_Johnsen@mindlink.UUCP (Chris Johnsen) writes: >>Ian Farquhar (ifarqhar@sunb.mqcc.mq.oz.au) suggested that a compiled module be >>run concurrent with an emulator type section so that in some parallel way, any >>references within the source executable, which would also be loaded during >>runtime of the emulation, could be validated. It could be argued that this >>should be placed on the pro side of the ledger, but the incurred overhead >>during execution would be large. This function, I had imagined initially, >>would be carried out during the transpiler phase and not attached to, or >>burdening, the runtime execution. > >No, I suggested that an image of the original code, without translation, >be carried around with the compiled code, with indexes into the compiled >code's equivalent sections that would allow jump tables and so forth to >function correctly. This would be a memory intensive but low-time-overhead >time way of resolving problems. It should also be noted that such a >system would allow trapping of addresses that need to be handled by >special code (hardware locations etc.), and that the image is also used >for data storage, meaning that once loaded and allocated, the program >would be unlikely to need any further allocation of heap. > >This system solves two problems: the data/code differentiation (if you >accidentally compile some data, it is no great problem) though a much >more minor problem remains if a couple of opcodes are missed because of >data being accidentally compiled and the compiler assuming an opcode is >the data for a false opcode, and also the problem of jump tables >and jumping to locations stored in registers. > >-- >Ian Farquhar Phone : + 61 2 805-9400 >Office of Computing Services Fax : + 61 2 805-7433 >Macquarie University NSW 2109 Also : + 61 2 805-7420 >Australia EMail : ifarqhar@suna.mqcc.mq.oz.au > > >From: srwmpnm@windy.dsir.govt.nz >Newsgroups: comp.sys.amiga.emulations >Subject: Re: CPU-emulators >Message-ID: <18878.2801a4ad@windy.dsir.govt.nz> >Date: 9 Apr 91 11:25:31 GMT >Organization: DSIR, Wellington, New Zealand > >Ilja Heitlager (iheitla@cs.vu.nl) wrote: >>I'm planning to write a 6502 (and maybe when I like it some others) emulator. > >Good on you! I've played around with the Z80 emulators for the Amiga, by Ulf >Nordquist and Charlie Gibbs, making them faster. I have never touched 6502 but >the same techniques should apply. > >>At this moment I think there are two ways of doing it: >> 1- Compare every Opcode and jump to a routine which executes the >> instruction >> 2- Do it more or less the way the microcode does it. >> Ok in software you can't do more operations at the same moment. > >I found at several more fundamentally different ways of doing it, and many >variations on those. So far the fastest practical method seems to be threaded >code. You can avoid decoding an opcode for every 6502 instruction altogether! >The emulation routine for each 6502 opcode ends with: > > move.l (a3)+,a0 > jmp (a0) > >So each emulation routine jumps directly to the next emulation routine without >any decoding at all. Register a3 is acts like a "pseudo pc" into a 256 kbyte >table in which there is a longword pointer to the emulation routine for each >corresponding opcode in the 64 kbyte 6502 address space. > >Now, every time the 6502 writes to RAM, you need to update an entry in the >256 kbyte table. At first it looks as if you have to do an instruction decode >to compute the new table value every time the 6502 writes to RAM. But in fact >that is not necessary either! > >What you do, when the 6502 writes to RAM, is to write a constant address into >the table. That constant address points to a special routine called "patch". >When patch is called, you finally get to do an instruction decode. Patch >computes the address of the routine for the current instruction, stuffs it >in the 256 kbyte table, then jumps to the routine for the current instruction. >Next time this instruction is executed, control bypasses patch and goes >directly to the right routine. > >A variation of this method which saves memory but is slightly slower, is to use >word offsets in a 128 kbyte table, instead of longword addresses in a 256 kbyte >table. Each routine ends with: > > move.w (a3)+,d0 > jmp 0(a2,d0.w) > >where a2 holds the base from which all the routine offsets are computed. > >This method has more advantages: > >1: To handle known ROM entry points, just point the vector for the entry point >at an optimised 68000 routine to do what the ROM routine does. There is no >overhead at all in checking for ROM entry points. > >2: To handle multiple-byte opcodes (e.g, prefix instructions), patch can be made >smart enough to point the vector for the prefix byte to the routine for the >entire instruction. There is no need to decode opcodes after the prefix every >time the instruction is executed. > >3: Patch can be made smart enough to recognise common sequences of 6502 >instructions, and to point the vector at an optimised 68000 routine for the >whole sequence. > >Note that 2 and 3 above (if implemented) won't correctly emulate certain types >of self-modifying code. > >There was a good article on "Portable Fast Direct Threaded Code" by Eliot >Miranda in comp.compilers recently. He uses GCC to write "machine independent" >threaded code that is just about as efficient as my 68000-specific code. > >Hope this helps. Regards, Peter McGavin. (srwmpnm@wnv.dsir.govt.nz) > >