home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The Equalizer BBS
/
equalizer-bbs-collection_2004.zip
/
equalizer-bbs-collection
/
DEMOSCENE-STUFF
/
NL-DEM81.ZIP
/
DN_2OF2.081
< prev
next >
Wrap
Text File
|
1995-01-29
|
26KB
|
626 lines
DemoNews.081.continued.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.part.2.of.2
SECTIONS ARTICLES
---------------- -----------------------------------
Code Assembly Part 3 (It ain't no party)
BSP Trees
Back Issues How to Get 'em, Descriptions
Closing Comments Quote for the Week, etc.
.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,
_____Assembly Part 3 by Jason Nunn
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\\\\[ "Implementation Techniques" - Assembly Part III
\\\\\\\[[ By Jason Nunn
\\\\\[[[[
\\[[[[[[[
____________________________________________________________________
In this issue I will be discussing the basic nuts and bolts on how to
implement an assembly program geared towards a demo. This article is
intended for the C or Pascal programmer who hasn't quite got the confidence
to use a full blown assembly compiler. In the first part, you may remember
me telling you of a friend that has reached a turning point in his coding
development, yet he still won't "take the plunge" because he doesn't have
access to all the nice perks of 3GL's, like sine, cosine, and random
functions and larger precision variables than the chip itself. This issue
will hopefully provide that incentive.
But...., before we do that, I would like to first finish off last article's
talk on optimization. I forgot to include the fast string functions. I'm
not going to waffle on too much about this now, as this article is
dedicated to "assembly techniques". All I will do here is show my results,
and give out some tips.
Same things apply on this run - my machine is a 486-33 ISA, operating in
P-mode, and the lower the number, the faster a given instruction is.
STOSB [209729] MOV [EDI],AL [134226]
INC EDI
STOSD [306805] MOV [EDI],EAX [244208]
INC EDI
LODSB [209729] MOV AL,[ESI] [134226]
INC ESI
STOSD [306805] MOV EAX,[ESI] [244208]
INC ESI
MOVSB [228550] MOV AL,[ESI] [142124]
MOV [EDI],AL
INC ESI
INC EDI
MOVSD [384379] MOV EAX,[ESI] [324112]
MOV [EDI],EAX
INC ESI
INC EDI
Of course, REP operations are faster than their equivalent by about half.
In general, it is better to use MOV's and INC's to perform one off
operations. So if you're coding with these instructions, chances are that
you can get a bit more speed out of your code.
Ok, now on with the main talk. Generally, I won't be going into great depth
as there are plenty of tutorials and manuals on the net that explain the
rank basics of assembly. My role here will be to highlight and familiarize
extrordinary things about coding methods of assembly.
If you've never coded in assembly, then it may pay you to write your
equivalent program in a 3GL first and before converting it over. I guess it
depends on the person. I prefer to implement idea's in straight assembler.
I'm comfortable with the language enough to mumble it in my sleep and their
are no barriers or contingencies like there are in 3GL code. You can also
run into serious problems when converting to your target language, but I'm
sure that there are as many negative points about doing this as they are
positive points.
How to crunch huge numbers
--------------------------
One of the first questions a new demo coder may ask is how he/she could
add, multiply or subtract a number that is larger than the precision of the
chip. Well, this really doesn't apply now, as the standard is 32 bits. This
is ample for nearly all calculations, but for those of you that may want to
perform a 64 bit ADD calculation, this is how you do it:
ADD EAX,ECX
ADC EDX,0
In this example, we don't have a 64 bit register, therefore we must make
two data sources, whether they be registers or memory references to act as
one large register. In our case, EDX and EAX act as one. EAX contains the
least significant data and the EDX contains the most significant data of
our 64 bit number. ECX contains the number we are adding to this 64 bit
concatenated register. The basic idea behind this is that we first add ECX
to EAX. If the number in the EAX register "clocks" then the CPU's carry
flag will be set.
The next instruction - ADC (for those of you that don't know) is a funny
sort of ADD instruction that performs two add instructions. It will first
add the source register to the destination register, and then add 1 to the
source register if the carry is set. Hence the name "ADD ON CARRY". In our
example, if the carry flag is set, we will only add in the carry flag as
the source value is zero. Therefore, if the least significant component of
our 64 bit variable (EAX) clocks, the it will carry over to the EDX
component.
Although the above example only adds a 32 number to the 64 bit number. If
you wanted to add a 64 bit number to a 64 number then you would adopt the
following:
ADD EAX,ECX
ADC EDX,0
ADD EDX,EBX
Where EDX:EAX is the destination 64 register, and EBX:ECX is the source
register.
To add larger precision's, we simply chain!. Here we are adding a 32 bit
number that resides in EAX to a 128 bit number which is stored in
EDX,EBX,ECX and ESI.
ADD EDX,EAX
ADC EBX,0
ADC ECX,0
ADC ESI,0
To subtract, the same principle applies, accept we use SUB and SBB
instructions:
(a) (b)
SUB EAX,ECX SUB EAX,ECX
SBB EDX,0 SBB EDX
SUB EDX,EBX
With the 486's math coprocessor, the large multiplication and division is
more viable than our old conventional way of calculating large numbers;
which as you will see and very slow. Pretty soon, I will be exclusively
using coprocessor calculations in my demos, as they are extremely popular
now. Hence rendering the following code (for me) obsolete. However, for
names sake, I'll discuss the old way of doing things...
For multiplying a 64 bit variable to a 32 bit variable you can use this
algorithm:
MOV EAX,ESI
MUL EBX
PUSH EAX EDX
MOV EAX,ESI
MUL ECX
POP ECX EBX
ADD ECX,EAX
As a formula, the code is equivalent to this: ECX:EBX = ECX:EBX*ESI.
Note that you can chain this one also by taking the EDX value from the
second MUL and multiplying it by the next significant register of the
source and adding that answer into the respective register of the
destination.
Dividing is a little bit more complex. How complex?...this complex:
PROC LONG_DIV
OR EBP,EBX
JZ @@jump_0599
PUSH EBP
MOV EBP,ECX
OR EBX,EBX
PUSHF
JNS @@jump_0548
NOT ECX
NOT EBX
ADD ECX,01
ADC EBX,00
@@jump_0548:
OR EDX,EDX
PUSHF
JNS @@jump_0557
NOT EAX
NOT EDX
ADD EAX,01
ADC EDX,00
@@jump_0557:
MOV ESI,ECX
MOV EDI,EBX
XOR ECX,ECX
XOR EBX,EBX
MOV EBP,0021h
@@jump_0562:
RCL ECX,1
RCL EBX,1
SUB ECX,ESI
SBB EBX,EDI
JNB @@jump_0570
ADD ECX,ESI
ADC EBX,EDI
@@jump_0570:
CMC
RCL EAX,1
RCL EDX,1
DEC EBP
JNZ @@jump_0562
POPF
JNS @@jump_058A
NOT ECX
NOT EBX
ADD ECX,01
ADC EBX,00
POPF
JNS @@jump_058D
JMP @@jump_0597
@@jump_058A:
POPF
JNS @@jump_0597
@@jump_058D:
NOT EAX
NOT EDX
ADD EAX,0001
ADC EDX,00
@@jump_0597:
POP EBP
@@jump_0599:
RET
ENDP
This formula divides EDX:EAX by EBX:ECX. Just in case anybody recognizes
this thing, I've reversed it from a certain popular commercial package (not
giving any names) hehe :). I havn't used it since my real mode days
(which, for the record is about 2 years ago when coding TC669), and it's
basically optimized for that. I've made no attempt to optimize it for
P-mode, as I most likely will never use it ever again.
How to implement a Decimal point (or rather - a hexadecimal point :)
--------------------------------------------------------------------
Now that we have discussed the ways in which we can do a whole range of
calculations, your next question is how to implement floating/none discrete
calculations. For that, we must take a register/memory unit and divide it
into two parts. The number and a mantissa. For the sake of efficiency, you
would typically contain this in a single register, namely a 32 bit
register. I usually use this type of construct (represented in binary):
/------------32 bits------------\
NNNNNNNNNNNNNNNN.MMMMMMMMMMMMMMMM
Here you have a 16 bit actual number, with a 16 bit mantissa. As you can
see the actual number is of a higher order than normal. If to wanted to
extract the number from this variable, you can simply perform a SHR 16.
This will arrive you at the "NNN...." component of the number. Here is an
example of 1 and a half:
0000000000000001 1000000000000000b
If we wanted the discrete part of the number (ie the "1" part), then just
perform a SHR 16, which arrives us at: 0000000000000001. As you can see,
there is no real difference between discrete and non-discrete variables. To
the machine, it's all the same thing. The difference is the way you
interpret the product. Calculations are still no different to normal
numbers. If we wanted to add a "half" to this number then it's as simple
that this:
MOV EAX,00000000000000010000000000000000b ; this is a decimal "1"
ADD EAX,00000000000000001000000000000000b ;this is a decimal "0.5"
So, as you can see, it's not very hard. For multiplication, you're going to
have to include a SHRD instruction, as the number will now be in EDX and
the mantissa in EAX, hence the precision is now larger. This will return
the number back to the EAX 32 bit precision that it should be. Here is an
example:
MUL ECX
SHRD EDX,EAX,16
Here, we multiply EAX by ECX, which arrives at EDX:EAX. Then we just step
down this answer to arrive at the result, witch will now be contained in
EAX. With division, it's the opposite:
MOV EDX,0
SHLD EDX,EAX,16
DIV ECX
Here we are dividing EAX by ECX. Note the preparation just before the
divide.
Signed Data
-----------
As of now, we have only discussed unsigned data. Generally speaking, these
calculations are very simular, but there are some major differences.
Contained in a given 32 register, unsigned numbers go from 0 to FFFFFFFFh,
where as 32 signed data range from 80000000h which is the lowest number and
7FFFFFFFh begin the highest number. When using signed data, there are only
a couple of extra things you must know. Signed data has its own
multiplication and division instructions (ie IMUL and IDIV), and its own
set of conditional jump instructions.
JL (jump if less than) and JLE (jump if less than or equal to)
are a signed equivalent to
JB (jump if below) and JBE (jump if below or equal to).
JG (jump if greater) and JGE (jump if greater than or equal to)
are the signed equivalent to
JA (jump if above) and JAE (jump if above or equal to).
To change our unsigned divider from this....
MOV EDX,0
SHLD EDX,EAX,16
DIV ECX
...To a signed divider, simply substitute the MOV EDX,0 with a CDQ. The CDQ
extends a signed number in EAX into EDX. Example given:
CDQ
SHLD EDX,EAX,16
IDIV ECX
Implementing Complex mathematical relationships
-----------------------------------------------
At one time or another, a coder is going to have to use some sort of
complex mathematical function like triangle ratios, logarithmic factors and
random numbers to implement various things. To create a function that maps
a relationship in real time is basically impossible in efficiently terms.
The only way you can do this is to store relationships in the form of
tables. This may not be apparent to users of compilers like turbo C etc but
electronic calculators, compliers, maths coprocessors, spreadsheets all use
this method of mapping these relationships. it a very fast a convenient way
of doing things.
The first common function is the random function. A random signal can be
achieved using the following algorithm. The product of this function is a
random number stored in the EAX register.
;input: NIL; output: EAX
proc random
mov ebx,[random_seed1]
lea ebx,[ebx*4]
mov eax,[ebx+@@rantable]
mov ebx,[random_seed2]
lea ebx,[ebx*4]
add eax,[ebx+@@rantable]
mov [ebx+@@rantable],eax
inc [byte random_seed1]
and [byte random_seed1],01111b
dec [byte random_seed2]
and [byte random_seed2],01111b
ret
random_seed1
dd 2
random_seed2
dd 13
@@rantable:
dd 0fd8fce7ah,02d7ad7b7h,0f48a8f3ab,04a3b8f8bh
dd 0f2dec542h,0a847fab7h,0f4da81aab,04a348f86h
dd 024547edah,03b535a43h,0b35a535ab,0aa333483h
dd 0fd2f4e7ah,0c525a5b7h,016d3b4a4b,0643b4fd3h
endp
If you expand the table to 256 entries then you could eliminate two
instructions, but there again, it's not worth doing. This random function
will give you a very random signal :). There is only one problem with this
algorithm, and that is, the randomness will always follow the same pattern.
If this feature undesirable, then you may like to make an initiation module
that jumbles up the seeds or the numbers a bit. An obvious way of randomly
choosing a seed, would be to store a fixed reference variable in memory.
For example:
proc randomise
mov al,[043253445h]
mov [byte random_seed1],al
mov al,[012345678h]
mov [byte random_seed2],al
ret
endp
Anyway, I'm going to stop here as it's getting very close the deadline
time. One day, I'll learn not to leave things till last minute. In the next
part, I'll be hopefully finishing up this assembly series and moving on to
my talks of sound/tracker programming (the interesting stuff).
I'll be soon releasing a tracker that I have written called FunkTracker.
With this will be the full source code listing. My discussions will be
based around my knowledge and experience when producing current and past
trackers and players, and discussing implementation and hardware issues. I
also plan to discuss reverse engineering using microsoft CodeView, and plan
to obtain hack docs on the AWE32 card. So this will be all coming up!.
until next time.
See ya
:Jason Nunn
_____BSP Trees by Tom Verbeure
Problem situation: sorting polygons is slow and can be incorrect for
certain view-angles. Heavily influenced by Computer Graphics, Principles
and Practice, I have written this small tutorial for BSP trees, which
solves the problem for static objects and for every view angle.
As I already said: Binary Space Partitioning Tree. Unlike many other
abbreviations, this one really explains a lot of the algorithm: it uses a
tree. It partitions space and it partitions in two parts.
First: it is only usefull in static scenes: no 3D morphing or other goodies
are allowed.
Let's go to the 2D case, 3D is exactly the same.
Take a sample scene:
A\ ----- C|
\ B E |
------ |
|
/ .
/ V
D/
The positive side of the polygons is the side with the defining
character... Ignore V for now.
One could sort this thing during rendering, but as there can be no correct
sort criterium and sorting is slow, we don't want that. Besides, we have
memory to spare :-)
Now, we're going to build a tree that is totally viewpoint independent:
Take polygon B as the root. Polygon B divides space in to parts: the
positive and the negative side (Geee!) We have partitoned space in two.
First, scrap B from the 'not-used' polygons-array and classify the
remaining polygons. Group those on the + side, and those on the - side. As
you can see, polygon C is both on the + and the - side. What to do? Create
2 new polygons C+ and C-, erase C. Is there another complainer ? Nope: all
poly's are on either the + or the - side. Now we have this situation:
B
/ \
A,C+,E D,C-
Not really a tree yet, but we've only started...
Now, do the same thing for the groups at the child nodes, without caring
about those in another child-node.
For the + side of B, we have polys A,C+ and E. Take A as next node polygon.
Neither C+ nor E are on it's negative side (we ignore D and C-). For the
other node, take D as next node polygon. Only C- remains there, and it is
on the negative side. That side of the tree is finished. We have the
following situation:
B
/ \
/ \
A D
\ \
C+,E C-
There's one child with more that one poly left. Take C+ as node polygon, E
become it's child, on the positive side. We're finished.
Situation:
B
/ \
/ \
A D
\ \
C+ C-
/
E
Now, what can we do with it? A lot... Suppose the viewpoint is at position
V. In which order do we have to sort the polygons, when using a back to
front rendering algorithm ? Answer: walk the tree, make sure all nodes
(including childs) are visited.
Start at the root. Is V on the positive side? Nope, well, we want the polys
far away first, so walk the positive way. Are we on the positive side of A?
Yep, walk the negative way. It is empty! Ah. Well, draw A first. The go the
positive way. Are we positive of C+? Yep. Negative way of C+ is empty. Draw
C+ poly. Go positive way of C+. E has no child, draw it. Go up until a
non-empty branch is found, draw all node polygon not drawn already. We now
arrive at B again. Draw it. Negative is not visited yet, walk it. We're
negative of D. Positive way is empty. Draw D and go negative. C- has no
child. Draw it. All nodes have been visited. The end.
We have drawn the polygons in following order:
A, C+, E, B, D, C- which is a correct order. The BSP tree has to be
constructed only once and for all. From then on, sorting the polygons is
always correct and in linear time. Standard sorting algorithms can be
proved to be of n*log(n) order of time, so we have an increase in speed as
well.
Disadvantages:
- Memory: one has to have the tree in memory. This can be substantial for
lots of polygons.
- Polygon splitting: one ends up with more split polygons. It is almost
always unavoidable to do splitting.
- Polygons are not allowed to move.
A BSP tree is NOT unique: just pick another polygon as a node and one gets
a different one. In this case, one can avoid splitting polygons: start with
a root and build the following, correct, BSP tree:
C
/
B
/ \
A D
/
E
Tadaam! No polygon splitting!
Building a tree with a few splitting as possible is an exponential of the
number of polygons. As Foley and Van Dam says, just try a limited number of
nodepolygons, pick the one with the least splitting and the tree will be
good enough.
Voila. That's it. Not too difficult I think. Notice BSP trees are also
usefull to sort objects, by using planes that divide the space in such a
way that Object A is on the negative and Object B is on the positive side
of the plane. Very useful (only for non-intersecting objects).
This text is written without the Bible (Computer Graphics, P&P) besides me,
but since I read their chapter about BSP trees many times, it contains
almost the same info.
For polygon splitting algorithmes, there is one in Graphics Gems. I don't
know which one, but buy all four books, you won't be disappointed... :-)
Tom Verbeure
Synergy Design
.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,
<<Back Issues>>
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
_____How to Get 'em
After reading this issue of DemoNews, you may be wondering how you can get
previous ones. Well fear not! There are two different ways to do so:
1: FTP to hornet.eng.ufl.edu and go to /pub/msdos/demos/news/OLD_NEWS and
start downloading anything you see.
2: Now you can request back issues of DemoNews via e-mail. Start a letter
to listserver@oliver.sun.ac.za (any subject line) and in the body of the
letter include "get demuan-list <index>" where INDEX refers to the
index number of the issue.
For example: get demuan-list 43
This would retrieve DemoNews #76 (part 1 of 2).
For more recent issues that are split into multiple parts, you must send
an individual request for each index number.
_____Descriptions
Issue Index Date Size Description
----- ----- -------- ------ ----------------------------------------------
75 41,42 12/18/94 68009 A DemoNews Reader, The Birth of Commercial
Life, Editorial: Calm Before the Storm,
Interview with Mello-D, US Demo Scene
(Renaissance meeting), Jelly Tots and Pizza
Shops, Review of Wired '94 Graphics.
76 43,44 12/25/94 92589 Interview with EMF, DemoNews Readers Write,
Kimba's Life Story, X-Mas in the Demo Scene,
CORE, Demo & Music Database, Interview with
Purple Motion/Future Crew, Interview with
Krystall/Astek, Common Sense ][ by Perisoft,
Its X-Mas in Africa, Interview with Maxwood
of Majic 12, Assembly Part ][, Common Sense
Response by Stony.
77 45,46 01/01/95 101100 Chart History, Snowman Near-Disaster, Son of
Snowman, The Party 1994, Making Waves, Using
Assembly Part 2.
78 47-49 01/08/95 111185 The Party 1994: Results and Reviews, Report
by Stony and Friends, What happened to PC-
Demo competition. Editorial: TP94 = ASM94
part 2. Egg2: Trancescrambled Review, More
on Fast Tracker 2.03. General Rambling by
Denthor.
79 51 01/15/95 41832 A Day in the Life of Snowman, Ambient Sample
CD 1, Where's the Sound Blaster, TP94
Graphics review.
80 55 01/22/95 27028 DemoNews/HTML, Traffic Jam, CodeThink(School);
The Solo Sample CD
.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,
<<Closing Comments>>
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
The quote this week comes from "Assembly Language for the PC, Third
Edition". p.174
"A program is never done...but it must be stopped somewhere."
This was intended as a moral for programmers, but with a little rewording
the message is applicable to many areas in life.
See you in CyberSpace,
-Christopher G. Mann (Snowman)-
r3cgm@dax.cc.uakron.edu
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,End.of.DemoNews.081.