home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Collection of Hack-Phreak Scene Programs
/
cleanhpvac.zip
/
cleanhpvac
/
CIS_GAME.ARJ
/
QASMX1.THD
< prev
next >
Wrap
Text File
|
1993-06-24
|
42KB
|
980 lines
_____________________________ Subj: VGA Writes _____________________________
Fm: Activision/Infocom 76004,2122 # 193075
To: Dan R Corritore 70243,1110 (X) Date: 29-Jul-92 13:24:20
Here's a question. If I'm writing bytes/words (depending on the card) from
system ram to a VGA mode 13H display, and I interleave some processing
between byte/word writes, will the VGA hardare and the AT-BUS still wait
state me? In other words is:
MOVE VID,AX
JSR SOMEWHERE
MOVE VID,AX
JSR SOMEWHERE
etc...
Faster than...
MOVE VID,AX
MOVE VID,AX
etc.
JSR SOMEWHERE
JSR SOMEWHERE
Thanks,
William Volk
...........................................................................
Fm: Hans Peter Rushworth 100031,473 # 193218
To: Activision/Infocom 76004,2122 Date: 29-Jul-92 19:20:06
For a 386 processor (I think also for a 286), there are independant bus and
execution units that are able to overlap execution to some extent. This
means that external cycles should not effect instructions that are
pre-fetched and ready for execution. I'm not sure to what extent the internal
cache of the 486 assists here. So the answer to your question is probably
yes, although I would suggest using this feature to do work on internal
registers rather than CALL subroutines.
I suggest you write a test program to determine how much the improvement is.
About the best way to speed performance when writing words to VGA is to
ensure that the target address is on an even boundary. A simple bit of code
should explain this (part stolen from BC++ memcpy) -
;
; CX = pixel count (!= 0), DS:SI-> source bitmap, ES:DI->VGA RAM
;
test di,1 ;odd address ???
je short even ;no, begin with even address
movsb ;make it even and move a pixel
dec cx ;fix pixel count (if CX was zero on entry we're in trouble)
even:shr cx,1 ;convert to words and set carry if odd number of bytes
rep movsw ;copy stuff (doesn't effect carry) aligned so 1 mem cycle
;every time for a 16-bit VGA
adc cx,cx ;inc CX (set it to 1) if carry set (ends on odd address)
rep movsb ;NOTE: doesn't do move if CX is zero
The seperate bus I/F overlap and pre-fetch will help absorb the extra
instructions. Hope that helps.
Peter.
...........................................................................
Fm: John W. Ratcliff 70253,3237 # 270346
To: Serge Mathieu 71035,2771 (X) Date: 30-Dec-92 14:11:11
Serge,
About the VGA screen copy stuff.
I do a REP COMPSD, to do a double word compare to find all double words that
are the same. Then I do a REPNZ CMPSD to find the number of double words
that are different. Then back off the pointers, and move only the double
words that are changed to screen ram, and your system ram copy of screen ram.
Then back up to the top of the loop. I know it sounds crazy but VRAM is so
slow that this turns out to be anywhere from a lot faster to many, many,
times faster on really slow VGA cards. A REP COMPSW works just as well, but
use those 32 bit ops whenever you can. The disadvantage to this method is
you use up more of your precious system ram. The advantages are huge, and I
hope obvious.
I haven't really thought about this aproache for a panning/scrolling type
environment. I think more from the simulation standpoint where you a
re-rendering an entire screen, at extremely high frame rate, and most of the
pixels are the same color from frame to frame, just by the nature of you
rendering system. If most of the pixels are changing, then this method
sucks. My wireframe demo points this out quite nicely.
...........................................................................
Fm: Randy @ Safari 71165,3600 # 283754
To: Serge Mathieu 71035,2771 Date: 22-Jan-93 19:49:11
I don't know if you got a solid reply yet but on all my systems, when
blitting to video memory, writing words to odd addresses will slow the blit
down by as much as 50%, depending on the video card.
In normal RAM, odd word writes will cause a penalty of 33%, regardless of
cache or processor speed. This is because instead of accessing the MMU
(memory management unit) and writing once, the instruction has to write,
access the MMU, and write again.
...........................................................................
Fm: John Dlugosz [ViewPoint] 70007,4657 # 341935
To: VOR Technologies Inc 71333,134 (X) Date: 26-Apr-93 08:38:36
A word-aligned MOVSD does not save that much over MOVSW's. The video bus is
the bottleneck and it still copies the same number of bytes.
I used to know how may bus cycles it took for different kinds of transfers,
but I've forgotten. The numbers I just got empircially indicate 5 or 6 bus
cycles per word. That seems high. But it depends on the cards buffer for
recieving data before actually storing it in its own RAM, and how fast that
can get processed depends on the card's timing speed (ram will be buisy
displaying and can't be stored to) as well as how the card was made. A read
cycle, I recall, takes 850 to 1050 ns all told (but reads can't use the
buffer).
A single function call will take as much as 20 machine clocks (on a '386),
which is _nothing_ compared to the time of copying the memory to video. About
5 pixels worth!
--John
______________________ Subj: Local Variables On Stack ______________________
Fm: Hans Peter Rushworth 100031,473 # 261886
To: Mark Betz/Ass't SysOp 76605,2346 (X) Date: 14-Dec-92 22:26:11
On the subject of tweaks, have you tried aligning all your local variables so
the double words and words are all aligned? You can do this by ordering your
locals so the dwords, words and bytes are all grouped together, then clearing
the lower two bits of the BP, (you also have to ensure that there is a dummy
dword variable at the bottom of the stack frame to cater for the potential
"drop" of the BP. You also have to copy the parameters (if any) to the locals
area.
I think this is potentially worthwhile for those functions that exist for a
longish period of time, and where you "run out" of registers.
Peter.
...........................................................................
Fm: Hans Peter Rushworth 100031,473 # 262001
To: Mark Betz/Ass't SysOp 76605,2346 (X) Date: 15-Dec-92 03:02:28
>> What's the effect of clearing the low 2-bits of BP? You're just
subtracting a max of decimal 2 from the value, correct?
actually a max of 3 <g>. The reason behind this is that it makes the SS:BP
point at long word address boundary. This means that if (say) you were to
execute the instruction:
LES SI, DWORD PTR [BP-4] ; (for sake of argument) reads 4 bytes of data
then a 32 bit wide data bus CPU only needs to do 1 memory bus cycle to read
or write the 4 bytes of data. For this to be effective, you would organise
your local variables so that all the dwords are at the top of the stack
frame, all the words are under that, and finally all the bytes under that.
This then guarantees that when you access a local variable, the minimum
number of memory cycles are needed. Sometimes when the function is called the
stack will be correctly aligned, and this makes no difference, but other
times it may not.
Peter.
...........................................................................
Fm: Mark Betz/Ass't SysOp 76605,2346 # 262161
To: Hans Peter Rushworth 100031,473 (X) Date: 15-Dec-92 13:42:06
Right, 3. That's two bits worth, correct? <g>. Let me make sure I understand
this: the idea is that the stack will be dword alligned for 32-bit accesses,
and that it won't make any difference to byte-wide or word-wide accesses. One
thing still confuses me (p'raps more than one). Let's say that you have a
stack frame that looks like this on entry to a function:
dword <- BP + 14
dword <- BP + 10
word <- BP + 8
word <- BP + 6
byte <- BP + 4
byte <- BP + 2
word <- saved BP
Suppose that BP == 11CA. Clear the lower two bits and you have 11C8. Now BP
points to the word right below the saved BP. Do you simply add in an offset
in order to correctly address the stack values now? MOV AH [BP+4] gets the
first byte parameter from the stack, instead of MOV AH [BP+2]. You basically
have to add 2 to all of your offsets. Is that how you'd handle it?
--Mark
...........................................................................
Fm: Hans Peter Rushworth 100031,473 # 262239
To: Mark Betz/Ass't SysOp 76605,2346 (X) Date: 15-Dec-92 16:10:10
The precise location of the locals [BP-n] doesn't matter naturally, but the
function parameters cannot be accessed using [BP+n], so you need to copy them
into the local data area, and use the local copies. Just copy the BP into
BX (for example) before masking it, and then use BX to move the parameters.
Two other things:
(1) you have to allocate an extra (dummy) long word local at the bottom of
the local stack frame, so that if BP is decremented by 3 accessing the
bottom local won't blow the stack.
(2) You need to save the original (unmasked) BP so that you can copy this
value back into SP at the end of the function, (normally you copy BP->SP),
instead you just load SP from this saved value (which could be another
local). Example:
push bp ;save stack frame
mov bp,sp ;new stack frame
mov bx,bp ;keep copy of original frame
and bp,0FFFCh ;align BP
sub sp,FRAMESIZE ;allocate space
push di ;save register variables
push si ;
mov [bp-SAVEBP],bx ;save original BP
mov ax,ss:[bx+6] ;get arg1
mov [bp-param1],ax ;copy it ... same for other args
--- rest of function ----
pop si ;restore register vars
pop di ;
mov sp,[bp-SAVEBP] ;original frame
pop bp ;callers bp
retf ;exit
Peter.
_____________________ Subj: Function Arguments on Stack _____________________
Fm: Hans Peter Rushworth 100031,473 # 261555
To: Jesse 76646,3302 (X) Date: 14-Dec-92 14:10:26
>> what IS pushed by a function automatically?
Normal function:
High memory
argn <-- calling function pushes arguments in reverse order
arg2 eg f( arg1, arg2, ..., argn )
arg1
<return address> <--- the CALL instruction pushes the IP or CS:IP (model
dependant) of the next instruction of the calling func
_______________________ENTERS NEW FUNCTION_________________________________
BP <--- The called function saves old the Base pointer
and moves this stack address into the new BP
<local variables><--- The stack pointer is adjusted to make space
for the functions local (automatic) variables
DI <--- The called function pushes register variables
SP: SI (I may have the order wrong here)
<rest of stack> <--- used for temporary pushes and pops, other function
calls and interrupts.
Low memory
Cleanup on exit: first SI and DI are popped, then BP is copied back into
SP, SP now points at the calling functions BP on the stack. BP is popped
and a RET is performed, returning to the calling function. The function
will usually do an ADD SP,n to "remove" the arguments it placed on the stack.
The function return value is placed in AX or DX:AX depending on the size.
The called function accesses variables using BP, positive offsets are used
for the function actual parameters, and negative ones for the locals.
When an interrupt occurs (hardware or software) the following is pushed on
the stack by the CPU
FLAGS register <-- after the push the interrupt mask is set.
CS
IP
AX
BX
CX
DX
ES
DS
SI <--- Register varaibles are automatically saved
DI neither function explicitly saves or restores them
BP <--- The SP at this address is copied into the BP below
____________________________________ Enters interrupt handler_______
<local variables> <--- The function does a SUB SP,n to allocate space
for local variables, and sets up DS to point to
the handlers data segment.
SP:
<rest of stack> <--- used for push pop etc
Cleanup: The BP is copied back into the SP, and a IRET instruction is
executed, which reloads all the registers. (the function may modify the
registers on the stack to return values if this is a software interrupt,
but for hardware interrupts the registers on the stack must be READ ONLY).
I hope that is more or less a correct description.
Peter.
BTW, did you realise that the LOOP instruction on a 386/486 is actual SLOWER
than the equivalent seperate decrement and branch instructions?
...........................................................................
Fm: Mark Betz/Ass't SysOp 76605,2346 # 266171
To: Jesse 76646,3302 (X) Date: 22-Dec-92 11:35:30
Hi, Jesse. If you're saving all of the registers, then there's probably no
harm in using PUSHA/POPA, unless there are some hidden side effects that I'm
not aware of. However, there are registers that you don't need to save, even
if you're using them. AX is one, since the compiler expects it to be used for
return values. Also, SP isn't really restored, since it's value is discarded
by the POPA instruction, not copied back into the register. So there's 2 that
the instruction isn't needed for. That leaves 6. A PUSHA takes 18 clocks on
the 386, while PUSH only requires 2. POPA takes 24 clocks on the 386, and POP
takes 4. So you're wasting 6 clocks on the PUSHA, but the POPA works out
even. If the function is one that is called in a tight loop, say 10,000
times, then you're blowing off 60,000 clocks <g>.
_______________________ Subj: 386 Instruction Timing _______________________
Fm: KGliner 70363,3672 # 318190
To: all Date: 21-Mar-93 22:53:29
A simple asm question for you all:
On a 386, do these instructions take the same amount of time or is one
faster than the other:
mov [si],al
mov [si + 512],al
...........................................................................
Fm: Mike W. Smith 75300,3434 # 318283
To: KGliner 70363,3672 (X) Date: 22-Mar-93 01:59:24
KG>On a 386, do these instructions take the same amount of time or is one
KG>faster than the other:
KG> mov [si],al KG> mov [si + 512],al
On a 386, a "MOV mem,reg" is 2 clock cycles for any effective address.
...........................................................................
Fm: Randy @ Safari 71165,3600 # 318977
To: Mike W. Smith 75300,3434 (X) Date: 23-Mar-93 09:20:46
Except when an offset is used and that takes 1 cycle per byte of offset. If
the offset is BYTE, one cycle. If the offset is WORD, two cycles.
At least that's what Turbo Profiler says when I run the test.
Randy
...........................................................................
Fm: Mike W. Smith 75300,3434 # 319385
To: Randy @ Safari 71165,3600 Date: 23-Mar-93 23:50:43
RS>Except when an offset is used and that takes 1 cycle per byte of offset.
RS>If the offset is BYTE, one cycle. If the offset is WORD, two cycles.
That's different from what my tech reference says. A MOV reg,mem takes the
same time whether it's a byte or word. For 8086/88 processors the effective
address adds anywhere from 5 to 14 clock cycles to an instruction. For
286/386 processors, the only case where an extra clock is added is when all
three indexing elements are used (base, index, and displacement).
...........................................................................
Fm: Randy @ Safari 71165,3600 # 320207
To: Mike W. Smith 75300,3434 (X) Date: 25-Mar-93 09:22:21
->That's different from what my tech reference says. A MOV reg,mem ->takes
the same time whether it's a byte or word
That's true, except when you use an offset like [si+512] which causes and ADD
to be performed prior to the fetch. My timings are as such
10,000 iterations (includes loop time)
mov al,[si] .0047
mov al,[si+512] .0051/.0052 (fluctuated)
That was the original question, right?
_____________________________ Subj: 32 Bit Code _____________________________
Fm: Sarwan Narine 76675,164 # 319332
To: All Date: 23-Mar-93 22:17:49
Consider the following instructions:
#1. MOV AL, DS:[SI]
#2. MOV AL, DS:[ESI]
These instructions can be used to accomplish the same task. However, under
certain circumstances instruction #2 will fail. If SMARTDRV is _not_ loaded
then instruction #2 causes a hang-up, however, a CTRL-ALT-DEL will reset my
system. What does SMARTDRV do to enable instruction #2 to execute? BTW, my
program is written for 32-bit CPUs only. Thanks for any insight.
...........................................................................
Fm: Jaimi McEntire 71700,1202 # 319345
To: Sarwan Narine 76675,164 Date: 23-Mar-93 22:38:05
Sarwan, if your code before that loaded si (because you wanted a word), the
esi register could have trash in the upper 16 bits. in that case, you would
definitely need to either clear out esi (mov esi,0 ) before loading it, or
you would need to extend it as you moved it (cwde). Just as a side note, you
can of course use any 32 bit register as an index on the 386, if you did not
know that. also, you can use FS and GS too. all you need to do (if you are
using borland c) is compile by assembly. P.S. Smartdrv probably enables #2 to
execute because it clears the registers, because it too has 32 bit code.
Jaimi
...........................................................................
Fm: Bruce Nehlsen 76535,2466 # 319363
To: Jaimi McEntire 71700,1202 (X) Date: 23-Mar-93 23:00:26
Sarwan -
Another comment, since I had the same problem, except my code would CRASH if
and only if EMM386 was loaded.
Anyway, in my case it turned out to be my assembly directives. In some
modules I used <.code, .data> , and in some I used < DATA SEG ">. In one of
those 2 methods, the assembler was inserting ENTER and LEAVEs, which made a
big mess, since I already had those in there.
Bottom line - check the .LST file, and ensure that what you WROTE is what the
assembler generated.
Later...
...........................................................................
Fm: Dan Corritore 70243,1110 # 319394
To: Sarwan Narine 76675,164 Date: 24-Mar-93 00:05:40
Another thing I'd like to add to what the others have said is that you can't
use a value greater than 65535 in ESI if you have not changed the segment
limit stuff on the computer. The 386 (or higher) computer starts up with the
segment limits set to be 65535. If you can, always debug 386-specific code
with a 386-specific debugger.. it'll allow you to see things other debuggers
won't (and also capture exceptions --one of which is activated by using
invalid segment limits).
_Dan
...........................................................................
Fm: rod lentz 71163,57 # 319438
To: Dan Corritore 70243,1110 (X) Date: 24-Mar-93 02:53:58
re: segment limits, &c...
By my understanding, most machines running under DOS these days
are actually running in a virtual 86 (managed by emm386 or similar)
most of the time. In v86 mode, as I understand it, the segment limit
is always 0xffff. Therefore, without switching to protected mode
(via VCPI/DPMI/whatever), there shouldn't be any advantage to using
a dword subscript (such as [esi]).
Anybody care to confirm/refute this ?
- Rod
...........................................................................
Fm: Rob Nicholson (HMS Ltd) 100060,154 # 319468
To: rod lentz 71163,57 (X) Date: 24-Mar-93 05:40:20
There appears to be a 'fudge' that allows the segments to be >65535 in real
mode. One of the memory managers or disk caches (can't remember which) left
the segments unbounded.
Rob.
...........................................................................
Fm: Dan Corritore 70243,1110 # 319725
To: rod lentz 71163,57 (X) Date: 24-Mar-93 15:57:14
You are right. There is no advantage at all to using ESI over SI
without going into protected mode and switching the segment limit stuff
yourself (or having one of those DOS extenders, I believe). If you have to do
it yourself, get a 386 or 486 specific book which deals with that kind of
stuff(or both).. I'm planning on doing so one day when I feel the need for
stuff like that. Well, anyway, that stuff doesn't stop you from using the
32-bit registers and 32-bit instructions, though, so play with them all you
want!<g>
_Dan
...........................................................................
Fm: rod lentz 71163,57 # 319954
To: Dan Corritore 70243,1110 Date: 24-Mar-93 21:22:54
Rob - I've heard about the glitch/feature that allows segments
> 64k in real mode, but like I said, most PC's are spending most of
their time in v86 mode these days, where I don't believe it works.
And unfortunately, the predominant protected mode spec in use is
VCPI; has anybody else tried sifting through that one ? Not the
easiest spec I've seen...
Dan - I have been using the 32 bit reg's for math & stuf; I was
just hoping somebody knew of a way to do 32-bit addressing from
inside v86 mode. Segmented far pointers are a big clock-killer in
most of my apps.
- Rod
...........................................................................
Fm: Jaimi McEntire 71700,1202 # 321511
To: Dan Corritore 70243,1110 Date: 27-Mar-93 10:42:03
Oh, one other thing - you need to ignore the segment registers in flat model.
instead of using ES:DI, you would just use EDI. (or any other index or
general register for that matter).
Jaimi
...........................................................................
Fm: rod lentz 71163,57 # 320768
To: Rob Nicholson (HMS Ltd) 100060,154 Date: 26-Mar-93 04:42:26
Nope, (real mode != v86) ! Very similar, but not the same.
In v86 mode, a "supervising" program is needed to handle details of
the virtualization (virtual to physical memory mapping, &c.).
Also, from v86 mode you can't take advantage of the glitch/feature
of the 386/486's, where you can load the segment limits with large
(4 gig !) values in protected mode, switch back to real mode, and
then access huge segments from real mode.
- Rod
...........................................................................
Fm: Randy @ Safari 71165,3600 # 321275
To: Serge Mathieu 71035,2771 (X) Date: 26-Mar-93 22:53:07
Well, for one...
Movs to and from memory in Protected mode can take as much as 18
machine cycles as compared to 2 to 5 for a real mode 386 or 486 respectively.
This is your MAJOR slowdown.
I can't quote other speeds cause I've left my docs at home. But I am sure
some of the other memory intensive functions like AND/OR/XOR have the same
problem.
Randy
Safari
...........................................................................
Fm: Randy @ Safari 71165,3600 # 321274
To: Serge Mathieu 71035,2771 (X) Date: 26-Mar-93 22:52:45
Ok, here goes..
in REAL mode, you have this...
+---------------------------+ 0K
| |
~ ~
| |
+---------------------------+ 640k
| |
| |
| |
| |
+---------------------------+ Top of memory (max for machine)
the first part is what you can address directly from your program. The
second part MUST be addressed through a memory manager and is EXTREMELY slow.
In REAL FLAT MODE, you have the same thing, but (a) a FLAT MODEL HEAP MANAGER
allocates far memory quickly in blocks much larger than the page frame
(usually 64k) of EMS or XMS, and (b) you can access all of the allocated
memory with one instruction.
for example, in real mode, to access memory WITHIN the 640k boundary you must
do this (or the same thing some other way<g>)
asm les di, dword ptr [some_ptr] ; some_ptr is the FAR address
asm mov al, byte ptr es:[di] ; this eats cycles cause of the
; ES segment override.
in REAL FLAT MODE, you do this
asm mov ebx, dword [some_ptr] ; loads all 32 bits into EBX
; this is also faster
than LES DI
asm mov al,byte [EBX] ; 5 cycles maximum
in proteced mode, you do the same as in REAL FLAT MODE but it takes longer
because the processor is handling many tasks (internal and external), as well
as watching for segment overruns, at one time.
I'll post more tomorrow.
Randy
Safari
...........................................................................
Fm: rod lentz 71163,57 # 321380
To: Randy @ Safari 71165,3600 Date: 27-Mar-93 04:58:29
Randy -
Now, by "real flat mode", I assume you're using the trick of
loading up large selectors in protected mode, then switching back
to real mode (i.e., the "seg4g" trick) ? Which, as I understand,
doesn't work in v86 mode, i.e. anytime emm386 or similar managers
are running ?
Also, about your statements re:speed in different modes -
to the best of my understanding, all modes operate fairly equally.
The big killers are calling through task gates, and switching to/from
protected mode (which of course is needed for DOS calls, handling
real-mode interrupts, &c.). As far as the processor watching for
segment overruns, I believe that's done in all modes; it's just
a lot more likely to cause an exception in protected mode. Am I
mistaken ?
- Rod
...........................................................................
Fm: Mark Betz/Ass't SysOp 76605,2346 # 320453
To: Serge Mathieu 71035,2771 (X) Date: 25-Mar-93 18:53:58
I don't know that EMS is too slow. You can get 64k pages at a time, so it's
effectively like having a number of segments on tap. You just have to switch
them into the page frame. I'm not very expert on this topic, so I'd best
leave specific performance details to others. If Eric Pinnel is lurking
he'll tell you that the trend is towards 32-bit flat memory model, and I
think he'd be right.
--Mark
...........................................................................
Fm: Rob Nicholson (HMS Ltd) 100060,154 # 320760
To: Mark Betz/Ass't SysOp 76605,2346 (X) Date: 26-Mar-93 03:43:29
AFAIK, EMS (expanded) is faster than XMS (extended) when used from real mode.
With EMS, 64k can be banked into memory almost instantly. With XMS, you keep
having to copy chunks of memory backwards and forwards between extended and
conventional memory.
Rob.
...........................................................................
Fm: Mark Betz/Ass't SysOp 76605,2346 # 321090
To: Rob Nicholson (HMS Ltd) 100060,154 Date: 26-Mar-93 18:23:02
Hi, Rob. Wouldn't you also have to back-shuttle the EMS page if it changed? I
suppose you could use it for read-only stuff.
--Mark
...........................................................................
Fm: Dan Corritore 70243,1110 # 320656
To: Serge Mathieu 71035,2771 (X) Date: 25-Mar-93 22:31:56
Yeah, I'd opt for flat mode anyday, but EMS and XMS aren't all that bad to
use. I used EMS very briefly a while ago, but it's speed wasn't that slow.
Anyway, you always access it through a certain frame of memory (usually
D000-DFFF). I used it to just add a spare 64K of memory to my program (to use
just like anything else), but that's not really a good use for it at all..
XMS and the others I really don't know much about, but I'll probably run into
them soon or a later... actually, my PC INTERN book looks like it has a few
good sections on the various ways of accessing extended memory . Oh well,
that's all I could do..
_Dan
...........................................................................
Fm: Randy @ Safari 71165,3600 # 321652
To: rod lentz 71163,57 (X) Date: 27-Mar-93 14:51:54
No. The loading of the "large selectors" or the top 16 bits of the 32 bit
registers is not a function of protected mode. Nor is it slower or equally as
fast.
Real mode 32 bit moves take 2 to 5 cycles depending on the CPU, 2 for 486, up
to 5 on a 386.
In protected mode, you have bounds checking that occurs inside the processor
that takes extra cycles thus causing the incredible lag in MOV times. No
state switching occurs except in the startup where the procesor is told to
ignore segment boundary violations.
Randy
...........................................................................
Fm: rod lentz 71163,57 # 321737
To: Randy @ Safari 71165,3600 (X) Date: 27-Mar-93 17:23:20
So, does real flat mode work with emm386 (or similar) loaded ?
And, if so, do you have any sample code you're willing to share
of how to set it up ?
Also, what tools are you developing with then ? I assume mostly
assembler, to get the addressing modes you need.
As far as the bounds checking & other penalties in protected
mode - is this mentioned in the Intel doc's ? I don't remember seeing
that mentioned.
And state switching should be needed when running protected mode
under DOS, to handle hardware interrupts, DOS i/o, and interfacing
with all that other real mode code sitting underneath the protected
app.
- Rod
...........................................................................
Fm: Randy @ Safari 71165,3600 # 321862
To: rod lentz 71163,57 (X) Date: 27-Mar-93 21:16:29
No. State switching is not needed.
No memory manager, except HIMEM can be loaded as they all put the system in
V86 mode.
The tool is called BCCX32 and is put out by Network Systems Design.
(414) 231-3333 out of Oshkosh (b'gosh<g>), Wisconsin.
The guys' name is Jim Dempsey and after looking at his sample code, it looks
pretty good.
BCCX32 is a postprocessor that takes your BCC/TCC generated ASM output from
the compiler and strokes it, optimizes it, and re-generates 32-bit flat model
code that will run as it sits.
I know it sounds bizarre but it works.
Randy Safari
...........................................................................
Fm: rod lentz 71163,57 # 322004
To: Randy @ Safari 71165,3600 (X) Date: 28-Mar-93 00:59:40
Sounds like an interesting tool. I like the idea of the code
post-processor, so you can still use your compiler; nifty !
However, the (expected) memory manager conflict bothers me.
In my experience, having the user reconfigure/reboot/&c. is the
type of thing that causes many gripes. For some of the "turnkey"
systems I work on, it's still a possibility, but for anything aimed
at more general release, I shudder. What's your experience in
dealing with that ?
- Rod
...........................................................................
Fm: Randy @ Safari 71165,3600 # 322180
To: rod lentz 71163,57 (X) Date: 28-Mar-93 12:30:24
->experience in dealing with that.
None yet. EPIC's doing it with ZONE 66 and I REALLY don't like the
idea but as FLAT model packages become more and more the norm, people
WILL get used to it.
Randy
...........................................................................
Fm: Rob Nicholson (HMS Ltd) 100060,154 # 322060
To: Mark Betz/Ass't SysOp 76605,2346 (X) Date: 28-Mar-93 06:19:29
As most of our use for EMS is for storing bit-maps, I suppose it's read-only
and it works quite well. XMS copying is much slower for this purpose.
Rob.
_______________________ Subj: Boolean Sprite Masking _______________________
Fm: TIM 76247,1130 # 343432
To: ALL Date: 28-Apr-93 14:01:02
Here's one for the blitheads out there... <grin> I want to merge two
sprites, held in character arrays. (Let's call them A and B -- both "unsigned
char".) An individual value of '0' is equivalent to transparency.
Here's the question: is there a set of strictly Boolean operators that will
let me merge these two arrays? In other words,
C = (A & mask) | B,
where 'mask' is 0xFF everywhere B[?] = 0x00, and 0x00 everywhere else.
It seems to me that there is no Boolean way to create 'mask' (which looks
like the output of a stepping function), but maybe I'm just not thinking
clearly enough today.
Loops are EVIL. <grin> Is there a better way?
...........................................................................
Fm: Dan Corritore 70243,1110 # 343771
To: TIM 76247,1130 (X) Date: 28-Apr-93 22:15:39
I can't think of any logical operations, but this is what I'd do (in C, at
least):
C[n]= B[n] ? B[n] : A[n];
Sorry.. I can't think of any better way then to test for 0!
Now, in assembly, there is a neat trick which can be performed on 486+
processors, which doesn't require a jump. Here it is:
mov ah,B[n] ; pseudo-code -ish (implement how you choose)
mov al,0 ; the 'tester'
mov bh,A[n] ; again, pseudo-code -ish
;Now, here's the tricky part:
cmpxchg ah,bh ; now, ah will equal the correct value
mov C[n],ah ; pseudo-code ish..
There you go! Don't understand? Well, here's how it goes.. the
'cmpxchg ah,bh' instruction boils down to this:
if (AL==AH) AH=BH; // remember, AL==0
else AL=BH; // this part we don't care about..
// (what we do care about is that it didn't change AH)
Which equals, using the above code,
if (B[n]==0) C[n]=A[n];
else C[n]=B[n];
Do you understand?
_Dan
P.S. Thanks.. I needed to use my brain today! <g>
...........................................................................
Fm: Hans Peter Rushworth 100031,473 # 343849
To: TIM 76247,1130 (X) Date: 28-Apr-93 23:39:32
>> where 'mask' is 0xFF everywhere B[?] = 0x00, and 0x00 everywhere else. It
seems to me that there is no Boolean way to create 'mask'
Dan's way is correct IMO, but since you ask about how to make the mask:
movzx ax,byte ptr B[?] ;AL = pixel, AH = 0
dec ax ;AH = 0xFF if B[?] was zero, else 0x00
inc al ;restore AL=pixel
Peter.
...........................................................................
Fm: Dan Corritore 70243,1110 # 344166
To: Hans Peter Rushworth 100031,473 (X) Date: 29-Apr-93 13:14:17
That's a neat technique for creating the 'mask'. I guess I should've listened
to what he was asking more closely. (instead of giving him a way to do it
without the mask). There's so many tricks you can do in Assembly language,
which is why we love to program in it, yes?<g>
_Dan
...........................................................................
Fm: Hans Peter Rushworth 100031,473 # 344189
To: Dan Corritore 70243,1110 (X) Date: 29-Apr-93 13:44:03
>> so many tricks you can do in Assembly language, which is why we love to
program in it, yes?<g>
Absolutely!
I still think your C = B ? B : A; or if(!(C=B)) C=A;
is the proper method. But I thought the dec trick was interesting.
Peter.
...........................................................................
Fm: John Dlugosz [ViewPoint] 70007,4657 # 343887
To: TIM 76247,1130 (X) Date: 29-Apr-93 00:17:57
re loops: You need a loop to process the thing anyway. Step through C, A,
and B at the same time, processing one byte. (I assume you have 1 byte per
pixel, packed pixel format)
So, create the mask from B just for that byte, when needed, as part of the
main loop.
For a hint, look at the way the compiler generates code for the prefix !
operator. It involves no jumps.
However, since you need a jump _anyway_ to get back to the top of the loop,
you can double the loop and have the test on B being zero branch to two
different parts of code which OR's in B and advances or just advances, and
put this _before_ the test, so you still only have exactly one jump per
iteration.
--John
...........................................................................
Fm: Mark 'SAM' Baker 100025,444 # 344146
To: John Dlugosz [ViewPoint] 70007,4657 (X) Date: 29-Apr-93 12:48:49
All these complications.
Surely the fastest and most efficient (in terms of code size) method is :-
mov al,b[n]
jnz passover
mov al,a[n]
:passover
mov c[n],al
< this gives you the resultant pixel in c[n], no need to mask or anything >
Mark
...........................................................................
Fm: Hans Peter Rushworth 100031,473 # 344190
To: Mark 'SAM' Baker 100025,444 (X) Date: 29-Apr-93 13:44:14
Mark,
> mov al,b[n] > jnz passover
One small point: unlike nice Motorola processors, the mov instruction does
not effect any flags, so you would need a compare of some sort.
Tim's request included a "no branches" constraint, that's why we are being
devious in our ways. <g>
Peter.
...........................................................................
Fm: John Dlugosz [ViewPoint] 70007,4657 # 344361
To: Mark 'SAM' Baker 100025,444 (X) Date: 29-Apr-93 18:39*32
<<Surely the fastest and most efficient (in terms of code size) method is>>
Smallest code, but not the fastest! jumps are _expensive_. We go to great
lengths to avoid them. Figure 7 clocks plus pipeline delays for your "jnz
passover". That is half the time it takes to multiply, or longer than all
the rest of the instructions in that loop combined.
...........................................................................
Fm: Dan Corritore 70243,1110 # 344165
To: John Dlugosz [ViewPoint] 70007,4657 (X) Date: 29-Apr-93 13:14:12
Yeah.. I forgot about the ! operator. It should be easy to create a mask
doing that.. as such:
mask= -!B[?]; // do a 'not' and then negate it
This way, it will be either -1 (0xff) for a zero value or 0 for a non-zero
value.
_Dan
...........................................................................
Fm: Mark 'SAM' Baker 100025,444 # 344151
To: TIM 76247,1130 (X) Date: 29-Apr-93 12:54:07
Why do you need to bother with all this masking.
In pseudo-assembler, try this :-
mov al,b[n]
jnz not_b
mov al,a[n]
not_b:
mov c[n],al
This gives you the correct result, without any recourse to booleans.
I think it is probably also the fastest, and the most efficient in code size
that you will find.
OK, a purist wouldn't like the jump (it is a GOTO by any other name), but you
can't avoid them in assembler.
procedure BIT_SET;
begin
C[n]:=B[n];
if (C[n] = 0) then C[n]:=A[n];
end;
Mark
...........................................................................
Fm: Hans Peter Rushworth 100031,473 # 344191
To: Mark 'SAM' Baker 100025,444 (X) Date: 29-Apr-93 13:44:20
Sorry for the repeat:
>> mov al,b[n] >> jnz not_b
You must insert a cmp al,0 or "or al,al" to correctly set the zero flag. The
mov instruction does not effect it.
Peter.
...........................................................................