Collection of Hack-Phreak Scene Programs

home *** CD-ROM | disk | FTP | other *** search

/ Collection of Hack-Phreak Scene Programs / cleanhpvac.zip / cleanhpvac / CIS_GAME.ARJ / QASMX1.THD < prev next >

Wrap

Text File | 1993-06-24 | 42KB | 980 lines

_____________________________ Subj: VGA Writes _____________________________ Fm: Activision/Infocom 76004,2122 # 193075 To: Dan R Corritore 70243,1110 (X) Date: 29-Jul-92 13:24:20 Here's a question. If I'm writing bytes/words (depending on the card) from system ram to a VGA mode 13H display, and I interleave some processing between byte/word writes, will the VGA hardare and the AT-BUS still wait state me? In other words is: MOVE VID,AX JSR SOMEWHERE MOVE VID,AX JSR SOMEWHERE etc... Faster than... MOVE VID,AX MOVE VID,AX etc. JSR SOMEWHERE JSR SOMEWHERE Thanks, William Volk ........................................................................... Fm: Hans Peter Rushworth 100031,473 # 193218 To: Activision/Infocom 76004,2122 Date: 29-Jul-92 19:20:06 For a 386 processor (I think also for a 286), there are independant bus and execution units that are able to overlap execution to some extent. This means that external cycles should not effect instructions that are pre-fetched and ready for execution. I'm not sure to what extent the internal cache of the 486 assists here. So the answer to your question is probably yes, although I would suggest using this feature to do work on internal registers rather than CALL subroutines. I suggest you write a test program to determine how much the improvement is. About the best way to speed performance when writing words to VGA is to ensure that the target address is on an even boundary. A simple bit of code should explain this (part stolen from BC++ memcpy) - ; ; CX = pixel count (!= 0), DS:SI-> source bitmap, ES:DI->VGA RAM ; test di,1 ;odd address ??? je short even ;no, begin with even address movsb ;make it even and move a pixel dec cx ;fix pixel count (if CX was zero on entry we're in trouble) even:shr cx,1 ;convert to words and set carry if odd number of bytes rep movsw ;copy stuff (doesn't effect carry) aligned so 1 mem cycle ;every time for a 16-bit VGA adc cx,cx ;inc CX (set it to 1) if carry set (ends on odd address) rep movsb ;NOTE: doesn't do move if CX is zero The seperate bus I/F overlap and pre-fetch will help absorb the extra instructions. Hope that helps. Peter. ........................................................................... Fm: John W. Ratcliff 70253,3237 # 270346 To: Serge Mathieu 71035,2771 (X) Date: 30-Dec-92 14:11:11 Serge, About the VGA screen copy stuff. I do a REP COMPSD, to do a double word compare to find all double words that are the same. Then I do a REPNZ CMPSD to find the number of double words that are different. Then back off the pointers, and move only the double words that are changed to screen ram, and your system ram copy of screen ram. Then back up to the top of the loop. I know it sounds crazy but VRAM is so slow that this turns out to be anywhere from a lot faster to many, many, times faster on really slow VGA cards. A REP COMPSW works just as well, but use those 32 bit ops whenever you can. The disadvantage to this method is you use up more of your precious system ram. The advantages are huge, and I hope obvious. I haven't really thought about this aproache for a panning/scrolling type environment. I think more from the simulation standpoint where you a re-rendering an entire screen, at extremely high frame rate, and most of the pixels are the same color from frame to frame, just by the nature of you rendering system. If most of the pixels are changing, then this method sucks. My wireframe demo points this out quite nicely. ........................................................................... Fm: Randy @ Safari 71165,3600 # 283754 To: Serge Mathieu 71035,2771 Date: 22-Jan-93 19:49:11 I don't know if you got a solid reply yet but on all my systems, when blitting to video memory, writing words to odd addresses will slow the blit down by as much as 50%, depending on the video card. In normal RAM, odd word writes will cause a penalty of 33%, regardless of cache or processor speed. This is because instead of accessing the MMU (memory management unit) and writing once, the instruction has to write, access the MMU, and write again. ........................................................................... Fm: John Dlugosz [ViewPoint] 70007,4657 # 341935 To: VOR Technologies Inc 71333,134 (X) Date: 26-Apr-93 08:38:36 A word-aligned MOVSD does not save that much over MOVSW's. The video bus is the bottleneck and it still copies the same number of bytes. I used to know how may bus cycles it took for different kinds of transfers, but I've forgotten. The numbers I just got empircially indicate 5 or 6 bus cycles per word. That seems high. But it depends on the cards buffer for recieving data before actually storing it in its own RAM, and how fast that can get processed depends on the card's timing speed (ram will be buisy displaying and can't be stored to) as well as how the card was made. A read cycle, I recall, takes 850 to 1050 ns all told (but reads can't use the buffer). A single function call will take as much as 20 machine clocks (on a '386), which is _nothing_ compared to the time of copying the memory to video. About 5 pixels worth! --John ______________________ Subj: Local Variables On Stack ______________________ Fm: Hans Peter Rushworth 100031,473 # 261886 To: Mark Betz/Ass't SysOp 76605,2346 (X) Date: 14-Dec-92 22:26:11 On the subject of tweaks, have you tried aligning all your local variables so the double words and words are all aligned? You can do this by ordering your locals so the dwords, words and bytes are all grouped together, then clearing the lower two bits of the BP, (you also have to ensure that there is a dummy dword variable at the bottom of the stack frame to cater for the potential "drop" of the BP. You also have to copy the parameters (if any) to the locals area. I think this is potentially worthwhile for those functions that exist for a longish period of time, and where you "run out" of registers. Peter. ........................................................................... Fm: Hans Peter Rushworth 100031,473 # 262001 To: Mark Betz/Ass't SysOp 76605,2346 (X) Date: 15-Dec-92 03:02:28 >> What's the effect of clearing the low 2-bits of BP? You're just subtracting a max of decimal 2 from the value, correct? actually a max of 3 <g>. The reason behind this is that it makes the SS:BP point at long word address boundary. This means that if (say) you were to execute the instruction: LES SI, DWORD PTR [BP-4] ; (for sake of argument) reads 4 bytes of data then a 32 bit wide data bus CPU only needs to do 1 memory bus cycle to read or write the 4 bytes of data. For this to be effective, you would organise your local variables so that all the dwords are at the top of the stack frame, all the words are under that, and finally all the bytes under that. This then guarantees that when you access a local variable, the minimum number of memory cycles are needed. Sometimes when the function is called the stack will be correctly aligned, and this makes no difference, but other times it may not. Peter. ........................................................................... Fm: Mark Betz/Ass't SysOp 76605,2346 # 262161 To: Hans Peter Rushworth 100031,473 (X) Date: 15-Dec-92 13:42:06 Right, 3. That's two bits worth, correct? <g>. Let me make sure I understand this: the idea is that the stack will be dword alligned for 32-bit accesses, and that it won't make any difference to byte-wide or word-wide accesses. One thing still confuses me (p'raps more than one). Let's say that you have a stack frame that looks like this on entry to a function: dword <- BP + 14 dword <- BP + 10 word <- BP + 8 word <- BP + 6 byte <- BP + 4 byte <- BP + 2 word <- saved BP Suppose that BP == 11CA. Clear the lower two bits and you have 11C8. Now BP points to the word right below the saved BP. Do you simply add in an offset in order to correctly address the stack values now? MOV AH [BP+4] gets the first byte parameter from the stack, instead of MOV AH [BP+2]. You basically have to add 2 to all of your offsets. Is that how you'd handle it? --Mark ........................................................................... Fm: Hans Peter Rushworth 100031,473 # 262239 To: Mark Betz/Ass't SysOp 76605,2346 (X) Date: 15-Dec-92 16:10:10 The precise location of the locals [BP-n] doesn't matter naturally, but the function parameters cannot be accessed using [BP+n], so you need to copy them into the local data area, and use the local copies. Just copy the BP into BX (for example) before masking it, and then use BX to move the parameters. Two other things: (1) you have to allocate an extra (dummy) long word local at the bottom of the local stack frame, so that if BP is decremented by 3 accessing the bottom local won't blow the stack. (2) You need to save the original (unmasked) BP so that you can copy this value back into SP at the end of the function, (normally you copy BP->SP), instead you just load SP from this saved value (which could be another local). Example: push bp ;save stack frame mov bp,sp ;new stack frame mov bx,bp ;keep copy of original frame and bp,0FFFCh ;align BP sub sp,FRAMESIZE ;allocate space push di ;save register variables push si ; mov [bp-SAVEBP],bx ;save original BP mov ax,ss:[bx+6] ;get arg1 mov [bp-param1],ax ;copy it ... same for other args --- rest of function ---- pop si ;restore register vars pop di ; mov sp,[bp-SAVEBP] ;original frame pop bp ;callers bp retf ;exit Peter. _____________________ Subj: Function Arguments on Stack _____________________ Fm: Hans Peter Rushworth 100031,473 # 261555 To: Jesse 76646,3302 (X) Date: 14-Dec-92 14:10:26 >> what IS pushed by a function automatically? Normal function: High memory argn <-- calling function pushes arguments in reverse order arg2 eg f( arg1, arg2, ..., argn ) arg1 <return address> <--- the CALL instruction pushes the IP or CS:IP (model dependant) of the next instruction of the calling func _______________________ENTERS NEW FUNCTION_________________________________ BP <--- The called function saves old the Base pointer and moves this stack address into the new BP <local variables><--- The stack pointer is adjusted to make space for the functions local (automatic) variables DI <--- The called function pushes register variables SP: SI (I may have the order wrong here) <rest of stack> <--- used for temporary pushes and pops, other function calls and interrupts. Low memory Cleanup on exit: first SI and DI are popped, then BP is copied back into SP, SP now points at the calling functions BP on the stack. BP is popped and a RET is performed, returning to the calling function. The function will usually do an ADD SP,n to "remove" the arguments it placed on the stack. The function return value is placed in AX or DX:AX depending on the size. The called function accesses variables using BP, positive offsets are used for the function actual parameters, and negative ones for the locals. When an interrupt occurs (hardware or software) the following is pushed on the stack by the CPU FLAGS register <-- after the push the interrupt mask is set. CS IP AX BX CX DX ES DS SI <--- Register varaibles are automatically saved DI neither function explicitly saves or restores them BP <--- The SP at this address is copied into the BP below ____________________________________ Enters interrupt handler_______ <local variables> <--- The function does a SUB SP,n to allocate space for local variables, and sets up DS to point to the handlers data segment. SP: <rest of stack> <--- used for push pop etc Cleanup: The BP is copied back into the SP, and a IRET instruction is executed, which reloads all the registers. (the function may modify the registers on the stack to return values if this is a software interrupt, but for hardware interrupts the registers on the stack must be READ ONLY). I hope that is more or less a correct description. Peter. BTW, did you realise that the LOOP instruction on a 386/486 is actual SLOWER than the equivalent seperate decrement and branch instructions? ........................................................................... Fm: Mark Betz/Ass't SysOp 76605,2346 # 266171 To: Jesse 76646,3302 (X) Date: 22-Dec-92 11:35:30 Hi, Jesse. If you're saving all of the registers, then there's probably no harm in using PUSHA/POPA, unless there are some hidden side effects that I'm not aware of. However, there are registers that you don't need to save, even if you're using them. AX is one, since the compiler expects it to be used for return values. Also, SP isn't really restored, since it's value is discarded by the POPA instruction, not copied back into the register. So there's 2 that the instruction isn't needed for. That leaves 6. A PUSHA takes 18 clocks on the 386, while PUSH only requires 2. POPA takes 24 clocks on the 386, and POP takes 4. So you're wasting 6 clocks on the PUSHA, but the POPA works out even. If the function is one that is called in a tight loop, say 10,000 times, then you're blowing off 60,000 clocks <g>. _______________________ Subj: 386 Instruction Timing _______________________ Fm: KGliner 70363,3672 # 318190 To: all Date: 21-Mar-93 22:53:29 A simple asm question for you all: On a 386, do these instructions take the same amount of time or is one faster than the other: mov [si],al mov [si + 512],al ........................................................................... Fm: Mike W. Smith 75300,3434 # 318283 To: KGliner 70363,3672 (X) Date: 22-Mar-93 01:59:24 KG>On a 386, do these instructions take the same amount of time or is one KG>faster than the other: KG> mov [si],al KG> mov [si + 512],al On a 386, a "MOV mem,reg" is 2 clock cycles for any effective address. ........................................................................... Fm: Randy @ Safari 71165,3600 # 318977 To: Mike W. Smith 75300,3434 (X) Date: 23-Mar-93 09:20:46 Except when an offset is used and that takes 1 cycle per byte of offset. If the offset is BYTE, one cycle. If the offset is WORD, two cycles. At least that's what Turbo Profiler says when I run the test. Randy ........................................................................... Fm: Mike W. Smith 75300,3434 # 319385 To: Randy @ Safari 71165,3600 Date: 23-Mar-93 23:50:43 RS>Except when an offset is used and that takes 1 cycle per byte of offset. RS>If the offset is BYTE, one cycle. If the offset is WORD, two cycles. That's different from what my tech reference says. A MOV reg,mem takes the same time whether it's a byte or word. For 8086/88 processors the effective address adds anywhere from 5 to 14 clock cycles to an instruction. For 286/386 processors, the only case where an extra clock is added is when all three indexing elements are used (base, index, and displacement). ........................................................................... Fm: Randy @ Safari 71165,3600 # 320207 To: Mike W. Smith 75300,3434 (X) Date: 25-Mar-93 09:22:21 ->That's different from what my tech reference says. A MOV reg,mem ->takes the same time whether it's a byte or word That's true, except when you use an offset like [si+512] which causes and ADD to be performed prior to the fetch. My timings are as such 10,000 iterations (includes loop time) mov al,[si] .0047 mov al,[si+512] .0051/.0052 (fluctuated) That was the original question, right? _____________________________ Subj: 32 Bit Code _____________________________ Fm: Sarwan Narine 76675,164 # 319332 To: All Date: 23-Mar-93 22:17:49 Consider the following instructions: #1. MOV AL, DS:[SI] #2. MOV AL, DS:[ESI] These instructions can be used to accomplish the same task. However, under certain circumstances instruction #2 will fail. If SMARTDRV is _not_ loaded then instruction #2 causes a hang-up, however, a CTRL-ALT-DEL will reset my system. What does SMARTDRV do to enable instruction #2 to execute? BTW, my program is written for 32-bit CPUs only. Thanks for any insight. ........................................................................... Fm: Jaimi McEntire 71700,1202 # 319345 To: Sarwan Narine 76675,164 Date: 23-Mar-93 22:38:05 Sarwan, if your code before that loaded si (because you wanted a word), the esi register could have trash in the upper 16 bits. in that case, you would definitely need to either clear out esi (mov esi,0 ) before loading it, or you would need to extend it as you moved it (cwde). Just as a side note, you can of course use any 32 bit register as an index on the 386, if you did not know that. also, you can use FS and GS too. all you need to do (if you are using borland c) is compile by assembly. P.S. Smartdrv probably enables #2 to execute because it clears the registers, because it too has 32 bit code. Jaimi ........................................................................... Fm: Bruce Nehlsen 76535,2466 # 319363 To: Jaimi McEntire 71700,1202 (X) Date: 23-Mar-93 23:00:26 Sarwan - Another comment, since I had the same problem, except my code would CRASH if and only if EMM386 was loaded. Anyway, in my case it turned out to be my assembly directives. In some modules I used <.code, .data> , and in some I used < DATA SEG ">. In one of those 2 methods, the assembler was inserting ENTER and LEAVEs, which made a big mess, since I already had those in there. Bottom line - check the .LST file, and ensure that what you WROTE is what the assembler generated. Later... ........................................................................... Fm: Dan Corritore 70243,1110 # 319394 To: Sarwan Narine 76675,164 Date: 24-Mar-93 00:05:40 Another thing I'd like to add to what the others have said is that you can't use a value greater than 65535 in ESI if you have not changed the segment limit stuff on the computer. The 386 (or higher) computer starts up with the segment limits set to be 65535. If you can, always debug 386-specific code with a 386-specific debugger.. it'll allow you to see things other debuggers won't (and also capture exceptions --one of which is activated by using invalid segment limits). _Dan ........................................................................... Fm: rod lentz 71163,57 # 319438 To: Dan Corritore 70243,1110 (X) Date: 24-Mar-93 02:53:58 re: segment limits, &c... By my understanding, most machines running under DOS these days are actually running in a virtual 86 (managed by emm386 or similar) most of the time. In v86 mode, as I understand it, the segment limit is always 0xffff. Therefore, without switching to protected mode (via VCPI/DPMI/whatever), there shouldn't be any advantage to using a dword subscript (such as [esi]). Anybody care to confirm/refute this ? - Rod ........................................................................... Fm: Rob Nicholson (HMS Ltd) 100060,154 # 319468 To: rod lentz 71163,57 (X) Date: 24-Mar-93 05:40:20 There appears to be a 'fudge' that allows the segments to be >65535 in real mode. One of the memory managers or disk caches (can't remember which) left the segments unbounded. Rob. ........................................................................... Fm: Dan Corritore 70243,1110 # 319725 To: rod lentz 71163,57 (X) Date: 24-Mar-93 15:57:14 You are right. There is no advantage at all to using ESI over SI without going into protected mode and switching the segment limit stuff yourself (or having one of those DOS extenders, I believe). If you have to do it yourself, get a 386 or 486 specific book which deals with that kind of stuff(or both).. I'm planning on doing so one day when I feel the need for stuff like that. Well, anyway, that stuff doesn't stop you from using the 32-bit registers and 32-bit instructions, though, so play with them all you want!<g> _Dan ........................................................................... Fm: rod lentz 71163,57 # 319954 To: Dan Corritore 70243,1110 Date: 24-Mar-93 21:22:54 Rob - I've heard about the glitch/feature that allows segments > 64k in real mode, but like I said, most PC's are spending most of their time in v86 mode these days, where I don't believe it works. And unfortunately, the predominant protected mode spec in use is VCPI; has anybody else tried sifting through that one ? Not the easiest spec I've seen... Dan - I have been using the 32 bit reg's for math & stuf; I was just hoping somebody knew of a way to do 32-bit addressing from inside v86 mode. Segmented far pointers are a big clock-killer in most of my apps. - Rod ........................................................................... Fm: Jaimi McEntire 71700,1202 # 321511 To: Dan Corritore 70243,1110 Date: 27-Mar-93 10:42:03 Oh, one other thing - you need to ignore the segment registers in flat model. instead of using ES:DI, you would just use EDI. (or any other index or general register for that matter). Jaimi ........................................................................... Fm: rod lentz 71163,57 # 320768 To: Rob Nicholson (HMS Ltd) 100060,154 Date: 26-Mar-93 04:42:26 Nope, (real mode != v86) ! Very similar, but not the same. In v86 mode, a "supervising" program is needed to handle details of the virtualization (virtual to physical memory mapping, &c.). Also, from v86 mode you can't take advantage of the glitch/feature of the 386/486's, where you can load the segment limits with large (4 gig !) values in protected mode, switch back to real mode, and then access huge segments from real mode. - Rod ........................................................................... Fm: Randy @ Safari 71165,3600 # 321275 To: Serge Mathieu 71035,2771 (X) Date: 26-Mar-93 22:53:07 Well, for one... Movs to and from memory in Protected mode can take as much as 18 machine cycles as compared to 2 to 5 for a real mode 386 or 486 respectively. This is your MAJOR slowdown. I can't quote other speeds cause I've left my docs at home. But I am sure some of the other memory intensive functions like AND/OR/XOR have the same problem. Randy Safari ........................................................................... Fm: Randy @ Safari 71165,3600 # 321274 To: Serge Mathieu 71035,2771 (X) Date: 26-Mar-93 22:52:45 Ok, here goes.. in REAL mode, you have this... +---------------------------+ 0K | | ~ ~ | | +---------------------------+ 640k | | | | | | | | +---------------------------+ Top of memory (max for machine) the first part is what you can address directly from your program. The second part MUST be addressed through a memory manager and is EXTREMELY slow. In REAL FLAT MODE, you have the same thing, but (a) a FLAT MODEL HEAP MANAGER allocates far memory quickly in blocks much larger than the page frame (usually 64k) of EMS or XMS, and (b) you can access all of the allocated memory with one instruction. for example, in real mode, to access memory WITHIN the 640k boundary you must do this (or the same thing some other way<g>) asm les di, dword ptr [some_ptr] ; some_ptr is the FAR address asm mov al, byte ptr es:[di] ; this eats cycles cause of the ; ES segment override. in REAL FLAT MODE, you do this asm mov ebx, dword [some_ptr] ; loads all 32 bits into EBX ; this is also faster than LES DI asm mov al,byte [EBX] ; 5 cycles maximum in proteced mode, you do the same as in REAL FLAT MODE but it takes longer because the processor is handling many tasks (internal and external), as well as watching for segment overruns, at one time. I'll post more tomorrow. Randy Safari ........................................................................... Fm: rod lentz 71163,57 # 321380 To: Randy @ Safari 71165,3600 Date: 27-Mar-93 04:58:29 Randy - Now, by "real flat mode", I assume you're using the trick of loading up large selectors in protected mode, then switching back to real mode (i.e., the "seg4g" trick) ? Which, as I understand, doesn't work in v86 mode, i.e. anytime emm386 or similar managers are running ? Also, about your statements re:speed in different modes - to the best of my understanding, all modes operate fairly equally. The big killers are calling through task gates, and switching to/from protected mode (which of course is needed for DOS calls, handling real-mode interrupts, &c.). As far as the processor watching for segment overruns, I believe that's done in all modes; it's just a lot more likely to cause an exception in protected mode. Am I mistaken ? - Rod ........................................................................... Fm: Mark Betz/Ass't SysOp 76605,2346 # 320453 To: Serge Mathieu 71035,2771 (X) Date: 25-Mar-93 18:53:58 I don't know that EMS is too slow. You can get 64k pages at a time, so it's effectively like having a number of segments on tap. You just have to switch them into the page frame. I'm not very expert on this topic, so I'd best leave specific performance details to others. If Eric Pinnel is lurking he'll tell you that the trend is towards 32-bit flat memory model, and I think he'd be right. --Mark ........................................................................... Fm: Rob Nicholson (HMS Ltd) 100060,154 # 320760 To: Mark Betz/Ass't SysOp 76605,2346 (X) Date: 26-Mar-93 03:43:29 AFAIK, EMS (expanded) is faster than XMS (extended) when used from real mode. With EMS, 64k can be banked into memory almost instantly. With XMS, you keep having to copy chunks of memory backwards and forwards between extended and conventional memory. Rob. ........................................................................... Fm: Mark Betz/Ass't SysOp 76605,2346 # 321090 To: Rob Nicholson (HMS Ltd) 100060,154 Date: 26-Mar-93 18:23:02 Hi, Rob. Wouldn't you also have to back-shuttle the EMS page if it changed? I suppose you could use it for read-only stuff. --Mark ........................................................................... Fm: Dan Corritore 70243,1110 # 320656 To: Serge Mathieu 71035,2771 (X) Date: 25-Mar-93 22:31:56 Yeah, I'd opt for flat mode anyday, but EMS and XMS aren't all that bad to use. I used EMS very briefly a while ago, but it's speed wasn't that slow. Anyway, you always access it through a certain frame of memory (usually D000-DFFF). I used it to just add a spare 64K of memory to my program (to use just like anything else), but that's not really a good use for it at all.. XMS and the others I really don't know much about, but I'll probably run into them soon or a later... actually, my PC INTERN book looks like it has a few good sections on the various ways of accessing extended memory . Oh well, that's all I could do.. _Dan ........................................................................... Fm: Randy @ Safari 71165,3600 # 321652 To: rod lentz 71163,57 (X) Date: 27-Mar-93 14:51:54 No. The loading of the "large selectors" or the top 16 bits of the 32 bit registers is not a function of protected mode. Nor is it slower or equally as fast. Real mode 32 bit moves take 2 to 5 cycles depending on the CPU, 2 for 486, up to 5 on a 386. In protected mode, you have bounds checking that occurs inside the processor that takes extra cycles thus causing the incredible lag in MOV times. No state switching occurs except in the startup where the procesor is told to ignore segment boundary violations. Randy ........................................................................... Fm: rod lentz 71163,57 # 321737 To: Randy @ Safari 71165,3600 (X) Date: 27-Mar-93 17:23:20 So, does real flat mode work with emm386 (or similar) loaded ? And, if so, do you have any sample code you're willing to share of how to set it up ? Also, what tools are you developing with then ? I assume mostly assembler, to get the addressing modes you need. As far as the bounds checking & other penalties in protected mode - is this mentioned in the Intel doc's ? I don't remember seeing that mentioned. And state switching should be needed when running protected mode under DOS, to handle hardware interrupts, DOS i/o, and interfacing with all that other real mode code sitting underneath the protected app. - Rod ........................................................................... Fm: Randy @ Safari 71165,3600 # 321862 To: rod lentz 71163,57 (X) Date: 27-Mar-93 21:16:29 No. State switching is not needed. No memory manager, except HIMEM can be loaded as they all put the system in V86 mode. The tool is called BCCX32 and is put out by Network Systems Design. (414) 231-3333 out of Oshkosh (b'gosh<g>), Wisconsin. The guys' name is Jim Dempsey and after looking at his sample code, it looks pretty good. BCCX32 is a postprocessor that takes your BCC/TCC generated ASM output from the compiler and strokes it, optimizes it, and re-generates 32-bit flat model code that will run as it sits. I know it sounds bizarre but it works. Randy Safari ........................................................................... Fm: rod lentz 71163,57 # 322004 To: Randy @ Safari 71165,3600 (X) Date: 28-Mar-93 00:59:40 Sounds like an interesting tool. I like the idea of the code post-processor, so you can still use your compiler; nifty ! However, the (expected) memory manager conflict bothers me. In my experience, having the user reconfigure/reboot/&c. is the type of thing that causes many gripes. For some of the "turnkey" systems I work on, it's still a possibility, but for anything aimed at more general release, I shudder. What's your experience in dealing with that ? - Rod ........................................................................... Fm: Randy @ Safari 71165,3600 # 322180 To: rod lentz 71163,57 (X) Date: 28-Mar-93 12:30:24 ->experience in dealing with that. None yet. EPIC's doing it with ZONE 66 and I REALLY don't like the idea but as FLAT model packages become more and more the norm, people WILL get used to it. Randy ........................................................................... Fm: Rob Nicholson (HMS Ltd) 100060,154 # 322060 To: Mark Betz/Ass't SysOp 76605,2346 (X) Date: 28-Mar-93 06:19:29 As most of our use for EMS is for storing bit-maps, I suppose it's read-only and it works quite well. XMS copying is much slower for this purpose. Rob. _______________________ Subj: Boolean Sprite Masking _______________________ Fm: TIM 76247,1130 # 343432 To: ALL Date: 28-Apr-93 14:01:02 Here's one for the blitheads out there... <grin> I want to merge two sprites, held in character arrays. (Let's call them A and B -- both "unsigned char".) An individual value of '0' is equivalent to transparency. Here's the question: is there a set of strictly Boolean operators that will let me merge these two arrays? In other words, C = (A & mask) | B, where 'mask' is 0xFF everywhere B[?] = 0x00, and 0x00 everywhere else. It seems to me that there is no Boolean way to create 'mask' (which looks like the output of a stepping function), but maybe I'm just not thinking clearly enough today. Loops are EVIL. <grin> Is there a better way? ........................................................................... Fm: Dan Corritore 70243,1110 # 343771 To: TIM 76247,1130 (X) Date: 28-Apr-93 22:15:39 I can't think of any logical operations, but this is what I'd do (in C, at least): C[n]= B[n] ? B[n] : A[n]; Sorry.. I can't think of any better way then to test for 0! Now, in assembly, there is a neat trick which can be performed on 486+ processors, which doesn't require a jump. Here it is: mov ah,B[n] ; pseudo-code -ish (implement how you choose) mov al,0 ; the 'tester' mov bh,A[n] ; again, pseudo-code -ish ;Now, here's the tricky part: cmpxchg ah,bh ; now, ah will equal the correct value mov C[n],ah ; pseudo-code ish.. There you go! Don't understand? Well, here's how it goes.. the 'cmpxchg ah,bh' instruction boils down to this: if (AL==AH) AH=BH; // remember, AL==0 else AL=BH; // this part we don't care about.. // (what we do care about is that it didn't change AH) Which equals, using the above code, if (B[n]==0) C[n]=A[n]; else C[n]=B[n]; Do you understand? _Dan P.S. Thanks.. I needed to use my brain today! <g> ........................................................................... Fm: Hans Peter Rushworth 100031,473 # 343849 To: TIM 76247,1130 (X) Date: 28-Apr-93 23:39:32 >> where 'mask' is 0xFF everywhere B[?] = 0x00, and 0x00 everywhere else. It seems to me that there is no Boolean way to create 'mask' Dan's way is correct IMO, but since you ask about how to make the mask: movzx ax,byte ptr B[?] ;AL = pixel, AH = 0 dec ax ;AH = 0xFF if B[?] was zero, else 0x00 inc al ;restore AL=pixel Peter. ........................................................................... Fm: Dan Corritore 70243,1110 # 344166 To: Hans Peter Rushworth 100031,473 (X) Date: 29-Apr-93 13:14:17 That's a neat technique for creating the 'mask'. I guess I should've listened to what he was asking more closely. (instead of giving him a way to do it without the mask). There's so many tricks you can do in Assembly language, which is why we love to program in it, yes?<g> _Dan ........................................................................... Fm: Hans Peter Rushworth 100031,473 # 344189 To: Dan Corritore 70243,1110 (X) Date: 29-Apr-93 13:44:03 >> so many tricks you can do in Assembly language, which is why we love to program in it, yes?<g> Absolutely! I still think your C = B ? B : A; or if(!(C=B)) C=A; is the proper method. But I thought the dec trick was interesting. Peter. ........................................................................... Fm: John Dlugosz [ViewPoint] 70007,4657 # 343887 To: TIM 76247,1130 (X) Date: 29-Apr-93 00:17:57 re loops: You need a loop to process the thing anyway. Step through C, A, and B at the same time, processing one byte. (I assume you have 1 byte per pixel, packed pixel format) So, create the mask from B just for that byte, when needed, as part of the main loop. For a hint, look at the way the compiler generates code for the prefix ! operator. It involves no jumps. However, since you need a jump _anyway_ to get back to the top of the loop, you can double the loop and have the test on B being zero branch to two different parts of code which OR's in B and advances or just advances, and put this _before_ the test, so you still only have exactly one jump per iteration. --John ........................................................................... Fm: Mark 'SAM' Baker 100025,444 # 344146 To: John Dlugosz [ViewPoint] 70007,4657 (X) Date: 29-Apr-93 12:48:49 All these complications. Surely the fastest and most efficient (in terms of code size) method is :- mov al,b[n] jnz passover mov al,a[n] :passover mov c[n],al < this gives you the resultant pixel in c[n], no need to mask or anything > Mark ........................................................................... Fm: Hans Peter Rushworth 100031,473 # 344190 To: Mark 'SAM' Baker 100025,444 (X) Date: 29-Apr-93 13:44:14 Mark, > mov al,b[n] > jnz passover One small point: unlike nice Motorola processors, the mov instruction does not effect any flags, so you would need a compare of some sort. Tim's request included a "no branches" constraint, that's why we are being devious in our ways. <g> Peter. ........................................................................... Fm: John Dlugosz [ViewPoint] 70007,4657 # 344361 To: Mark 'SAM' Baker 100025,444 (X) Date: 29-Apr-93 18:39*32 <<Surely the fastest and most efficient (in terms of code size) method is>> Smallest code, but not the fastest! jumps are _expensive_. We go to great lengths to avoid them. Figure 7 clocks plus pipeline delays for your "jnz passover". That is half the time it takes to multiply, or longer than all the rest of the instructions in that loop combined. ........................................................................... Fm: Dan Corritore 70243,1110 # 344165 To: John Dlugosz [ViewPoint] 70007,4657 (X) Date: 29-Apr-93 13:14:12 Yeah.. I forgot about the ! operator. It should be easy to create a mask doing that.. as such: mask= -!B[?]; // do a 'not' and then negate it This way, it will be either -1 (0xff) for a zero value or 0 for a non-zero value. _Dan ........................................................................... Fm: Mark 'SAM' Baker 100025,444 # 344151 To: TIM 76247,1130 (X) Date: 29-Apr-93 12:54:07 Why do you need to bother with all this masking. In pseudo-assembler, try this :- mov al,b[n] jnz not_b mov al,a[n] not_b: mov c[n],al This gives you the correct result, without any recourse to booleans. I think it is probably also the fastest, and the most efficient in code size that you will find. OK, a purist wouldn't like the jump (it is a GOTO by any other name), but you can't avoid them in assembler. procedure BIT_SET; begin C[n]:=B[n]; if (C[n] = 0) then C[n]:=A[n]; end; Mark ........................................................................... Fm: Hans Peter Rushworth 100031,473 # 344191 To: Mark 'SAM' Baker 100025,444 (X) Date: 29-Apr-93 13:44:20 Sorry for the repeat: >> mov al,b[n] >> jnz not_b You must insert a cmp al,0 or "or al,al" to correctly set the zero flag. The mov instruction does not effect it. Peter. ...........................................................................