home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The Unsorted BBS Collection
/
thegreatunsorted.tar
/
thegreatunsorted
/
programming
/
asm_programming
/
CHAP26.DOC
< prev
next >
Wrap
Text File
|
1990-08-10
|
47KB
|
1,152 lines
276
CHAPTER 26 - SIMPLIFYING THE TEMPLATE
By the time you have finished this chapter your assembler files
will look cleaner. Unfortunately there is some heavy sledding
before we get there.
EXITING A PROGRAM
Till now, we have exited most programs with CTRL-C; otherwise the
program has done a return. A return to what? It has been
returning to a section of code that does INT 20h, one of the ways
of quitting a program when everything is in order. Notice the
"everything is in order" in the last sentence. What happens if
you have 2 files open, you are off in some subroutine, and you
have things so hopelessly confused that you might as well give
up? Can you call INT 20h? The answer is no for two reasons.
First, you need CS to point to the PSP (program segment prefix).
and you don't know where the PSP is. Secondly, you need to close
files. Now, it is possible to make some code to do this, but why
bother. We have a special interrupt for this:
INT 21h function 4Ch
AH = 4Ch
AL = return code
This will close all files, get you out of the program, and give a
return code that is usable by the calling program. Here's a small
program. Use template.asm and call this TEST4CH.ASM.
; - - - - - - - - - - - - - - - - - - - - - - - - -
CODESTUFF SEGMENT PUBLIC 'CODE'
ASSUME cs:CODESTUFF, ds:DATASTUFF
EXTRN get_unsigned_byte:NEAR
main proc far
start:
mov ax, DATASTUFF ; load ds
mov ds,ax
call get_unsigned_byte ; value is in al
mov ah, 4Ch ; int 21h, function 4Ch
int 21h
main endp
CODESTUFF ENDS
; - - - - - - - - - - - - - - - - - - - - - - - - - -
We have revised the CODESTUFF segment so there is only one EXTRN
statement and the beginning code:
______________________
The PC Assembler Tutor - Copyright (C) 1990 Chuck Nelson
Chapter 26 - Simplifying The Template 277
_____________________________________
push ds
sub ax, ax
push ax
is gone. Since we will never again do a return to the PSP, there
is no need for this code anymore.
The program gets a single byte to use as the exit code and then
exits using int 21h function 4Ch. Get this assembled and linked.
It should ask for a number and then exit. But where is that
number? It is available through the operating system. Make the
following batch file. It runs TEST4CH.EXE and then looks at the
error code. Unfortunately ERRORLEVEL is not available as an exact
number to a batch file, so we are checking to see if the return
code was above a certain level.
----------------- DO4CH.BAT -----------------------
test4ch
ECHO OFF
IF ERRORLEVEL 1 ECHO The return code was over 0
IF ERRORLEVEL 51 ECHO The return code was over 50
IF ERRORLEVEL 101 ECHO The return code was over 100
IF ERRORLEVEL 151 ECHO The return code was over 150
IF ERRORLEVEL 201 ECHO The return code was over 200
ECHO ON
----------------------------------------------------
Here's one run of the batch file:
>do4ch
>int4ch
The PC Assembler Helper Version 1.01
Copyright (C) 1989 Chuck Nelson All rights reserved.
Enter a number from 0 to 255 172
>ECHO OFF
The return code was over 0
The return code was over 50
The return code was over 100
The return code was over 150
This is what happens to DO4CH.BAT with a return code of 172.
From now on, always use INT 21h function 4Ch to exit.
SEGMENTS
Our major simplification has to do with segment names. Before we
go on with segment simplification, here are the rules the linker
uses. If you don't remember them, you should review Chapter 10
before going on.
During the link process, the linker will combine any segments
which:
The PC Assembler Tutor 278
______________________
1) have the same name.
2) are declared PUBLIC.
3) have the same class name (type).
The linker processes object modules from left to right on the
command line. The classes will be ordered in the ordering in
which they were encountered (including the empty class type).
Within each class, the segments will be ordered in the ordering
in which they were encountered.
If we have all these rules, how do high-level languages manage to
combine their data and code correctly? The answer is that they
use standardized segment definitions. Here are the basic ones for
our data:
;---------------------------------------------------
_DATA SEGMENT WORD PUBLIC 'DATA'
_DATA ENDS
;---------------------------------------------------
;---------------------------------------------------
CONST SEGMENT WORD PUBLIC 'CONST'
CONST ENDS
;---------------------------------------------------
;---------------------------------------------------
_BSS SEGMENT WORD PUBLIC 'BSS'
_BSS ENDS
;---------------------------------------------------
;---------------------------------------------------
STACK SEGMENT PARA STACK 'STACK'
STACK ENDS
;---------------------------------------------------
If all the code will fit in one segment we can use a single
segment name:
;---------------------------------------------------
_TEXT SEGMENT WORD PUBLIC 'CODE'
_TEXT ENDS
;---------------------------------------------------
otherwise we can make independent segments, each with an
independant name:
;---------------------------------------------------
name_TEXT SEGMENT WORD PUBLIC 'CODE'
name_TEXT ENDS
;---------------------------------------------------
where the "name" can be anything, but the "_TEXT" remains
invariable. Any subroutine calls within the segment can be NEAR,
while any calls to a different segment should be FAR.
The "WORD" in these definitions says that when the linker
combines segments into a larger segment, each subsegment must
start at an even address (a word boundary). This has to do with
the speed of word fetches from memory that we discussed in the
last chapter. "WORD" is fine for 16 bit data busses, but for a
Chapter 26 - Simplifying The Template 279
_____________________________________
80386 you actually want "DWORD" so things are correctly aligned
with a 32 bit data bus. "PARA" means paragraph and that means
aligned with a segment starting address (every 16 bytes).
Everything will work with "PARA".
For reasons of convenience, compilers put different types of data
in different segments. For you, there is no reason to use more
than one segment, and that is:
;--------------------------------
_DATA SEGMENT WORD PUBLIC 'DATA'
_DATA ENDS
;--------------------------------
Compilers use these different segments because they can. If they
were constrained to use only one segment name, they could do it
with no problem. What is in these different segments?
_DATA standard initialized data
CONST data constants
_BSS uninitialized static data
STACK room for the SS:SP stack
So what do these things mean?
_DATA
The _DATA segment stores all initialized data which exists from
the time the program starts till the time that the program ends.
In C:
static int x = 5;
In Pascal:
const
my_salary : real = 52.77
These variables have a specific value at the start of the
program, even before the first instruction is executed. This
value may change during the program. The variable exists during
the whole program.
_BSS
The _BSS segment stores all uninitialized data which exists from
the time the program starts till the time that the program ends.
In C:
static int x ;
In Pascal, any variable declared outside a procedure but without
an initial value will be in _BSS. In compiled BASIC, everything
except dynamic arrays is in the _BSS. These variables have an
indeterminate value at the start of the program and exist during
the whole program.
The PC Assembler Tutor 280
______________________
CONST
CONST takes all constants which are longer than 2 bytes. If the
compiler is on its toes, anything one or two bytes long will be
coded into the machine instructions since this is much faster.
What is a constant? It is anything that has a value but doesn't
have a variable name:
value = 275.29 ;
printf ( "Mr. Yellow: 'Read my lips - no new taxis!'\n");
result = value / 27.619 ;
file_ptr = fopen ( "stuff.doc", "r+") ;
All the numbers and all the text strings need to be stored
somewhere. They are stored in the CONST segment and given an
internal name by the compiler so they can be used at the
appropriate location. They are not available in other parts of
the program.{1} These constants are sometimes called literals.
STACK
BASIC does not use the stack in the same way as Pascal and C. In
BASIC it is used only for passing variables between subroutines.
In C and Pascal, most variables are temporary. They come into
existance at the beginning of the subroutine and they disappear
upon leaving the subroutine. When you call the subroutine again,
the values these variables have are indeterminate. These
variables all exist on the stack relative to BP, the base
pointer. This is why you can have recursion in C and Pascal but
not in BASIC.
As I said, you don't need to put your different types of data in
different segments. It can all go into _DATA.
GROUPS
We now come to the bizarre. You will notice that when the linker
links all these object modules together, it will have four
distinct segments with each segment having a distinct class name.
We will get:
_DATA 'DATA'
CONST 'CONST'
_BSS 'BSS'
STACK 'STACK'
The problem here is that we want to set DS at the beginning of
____________________
1. There is an exception. Some compilers check to make sure
that there are no duplicates of the constant. These compilers
give all duplicates the same address so there is only one copy of
any one constant such as 0, 1, etc.
Chapter 26 - Simplifying The Template 281
_____________________________________
the program so that it will reference all the data. How are we
going to do this? The warped minds of electrical engineers and
computer scientists spent hours and hours trying to find the most
obscure way possible to unify data addressing and they came up
with GROUPS.
You can tell the linker that you want data from distinct segments
to be referenced by the offset from the beginning of the lowest
segment in memory that belongs to the group. Read this about five
or ten times to get the hang of it. You tell the linker that a
bunch of different segments belong to a group. It will find the
segment which is lowest in memory and then whenever you ask for
the GROUP offset, the linker will calculate the offset from the
beginning of this first segment.
The way you define a group is with a name, the word "GROUP", and
then a list of those segments in the file which belong to the
group:
DGROUP GROUP _DATA, CONST, _BSS, STACK
Note that it is the segment names, not the class names. DGROUP is
the standard name for the data group. If the assembler gives the
linker the correct information, the linker will adjust all
offsets relative to the beginning of the group. The only limit on
a group is that the distance from the first byte of the group to
the last byte of the group must be 65535 bytes or less. This is
because all the group segments must reside in one physical
segment in memory.
It is not even necessary for all the segments in a block of
memory to belong to the group. Consider the following ordering of
segments in memory.
_DATA
DATASTUFF
CONST
CODESTUFF
_BSS
EVENMORESTUFF
STACK
As long as the distance from one end of _DATA to the other end of
STACK is 65535 bytes or less, the linker will adjust the offsets
in _DATA, CONST, _BSS and STACK relative to the start of DGROUP
and the linker will adjust the offsets of DATASTUFF, CODESTUFF
and EVENMORESTUFF relative to their respective segment starting
addresses. I didn't say that this was good programming, I only
said that it was possible.
Thoroughly confused? You're not alone. Just remember, in all
compiled languages, we are going to combine these four types of
segments into a single group where offsets are relative to the
very beginning of the data.
Before getting you even more confused, let's take a look at what
we have so far. Make sure you actually do all of the following
The PC Assembler Tutor 282
______________________
examples. Use template.asm and at the very top, put in the
following segments:
; - - - - - - - - -
SEG1 SEGMENT 'STUFF'
db 100 dup (?)
seg1_data db ?
db 899 dup (?)
SEG1 ENDS
; - - - - - - - - -
SEG3 SEGMENT 'STUFF'
db 300 dup (?)
seg3_data db ?
db 699 dup (?)
SEG3 ENDS
; - - - - - - - - -
SEG5 SEGMENT 'STUFF'
db 500 dup (?)
seg5_data db ?
db 499 dup (?)
SEG5 ENDS
; - - - - - - - - -
Call this program QGROUP1.ASM. These segments are 1000 bytes
long, and the data names are 100, 300 and 500 bytes into their
respective segments. Because these segments will be paragraph
aligned, the second and third segments will start 1008 bytes (16
X 63) after the proceeding one. You need to tell the assembler
that these are in a group and give the proper ASSUME statement.
We'll call this QGROUP:
QGROUP GROUP SEG1, SEG3, SEG5
ASSUME cs:CODESTUFF, ds:DATASTUFF, ds:QGROUP
Here's some code:
; + + + + + + + + + + + + START CODE BELOW THIS LINE
lea ax, seg1_data
call print_unsigned
lea ax, seg3_data
call print_unsigned
lea ax, seg5_data
call print_unsigned
; + + + + + + + + + + + + END CODE ABOVE THIS LINE
As you can see, all we are doing is putting the addresses into AX
and then printing them as unsigned numbers. Here's the output:
00100
01308
02516
Remember, each segment is starting 1008 bytes after the start of
the previous one. Here's the same program with a few extra
segments thrown in. Call it QGROUP2.ASM.:
; - - - - - - - - -
Chapter 26 - Simplifying The Template 283
_____________________________________
SEG1 SEGMENT 'STUFF'
db 100 dup (?)
seg1_data db ?
db 899 dup (?)
SEG1 ENDS
; - - - - - - - - -
SEG2 SEGMENT 'STUFF'
db 200 dup (?)
seg2_data db ?
db 799 dup (?)
SEG2 ENDS
; - - - - - - - - -
SEG3 SEGMENT 'STUFF'
db 300 dup (?)
seg3_data db ?
db 699 dup (?)
SEG3 ENDS
; - - - - - - - - -
SEG4 SEGMENT 'STUFF'
db 400 dup (?)
seg4_data db ?
db 599 dup (?)
SEG4 ENDS
; - - - - - - - - -
SEG5 SEGMENT 'STUFF'
db 500 dup (?)
seg5_data db ?
db 499 dup (?)
SEG5 ENDS
; - - - - - - - - -
This is almost the same thing but we have added two more
segments. We are NOT going to join these two segments into the
group. Here's the GROUP and ASSUME statements:
QGROUP GROUP SEG1, SEG3, SEG5
ASSUME ds:SEG2, ds:SEG4
ASSUME cs:CODESTUFF, ds:DATASTUFF, ds:QGROUP
Make sure the ASSUME statements are in that order or things may
get confused. We also add some code:
; + + + + + + + + + + + + START CODE BELOW THIS LINE
lea ax, seg1_data
call print_unsigned
lea ax, seg2_data
call print_unsigned
lea ax, seg3_data
call print_unsigned
lea ax, seg4_data
call print_unsigned
lea ax, seg5_data
call print_unsigned
; + + + + + + + + + + + + END CODE ABOVE THIS LINE
This shows the addresses of all five variables. Here's the new
output:
The PC Assembler Tutor 284
______________________
00100
00200
02316
00400
04532
As you can see, the GROUPed segments have their offsets relative
to the beginning of the group while the others have their offsets
relative to the beginning of the segment.
Make a copy of QGROUP1.ASM and call it QGROUP?.ASM. Leave the
segment definitions, group definitions, ASSUME statements and
code the same, but add six more lines of code at the end:
; compare the offsets
mov ax, offset seg1_data
call print_unsigned
mov ax, offset seg3_data
call print_unsigned
mov ax, offset seg5_data
call print_unsigned
; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE
After the three LEAs we now do 3 OFFSETS. Assemble and link this.
Here's the output:
00100
01308
02516
00100
00300
00500
Wait a minute! Those last three numbers should be the same as the
first three numbers. That's right, folks. This is a known error
in the MASM assembler. In fact the Turbo Assembler copies this
mistake when it is in "MASM" mode but does it right when it is in
"IDEAL" mode. A86 does it right all the time. Here is the output
from the same source file when assembled by A86:
00100
01308
02516
00100
01308
02516
You have told the assembler to calculate all offsets relative to
the beginning of the group and MASM is ignoring you every time
you use the OFFSET operator. The code fix for this is to use an
override when you use OFFSET:
; compare the offsets
mov ax, offset QGROUP:seg1_data
call print_unsigned
mov ax, offset QGROUP:seg3_data
Chapter 26 - Simplifying The Template 285
_____________________________________
call print_unsigned
mov ax, offset QGROUP:seg5_data
call print_unsigned
; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE
Better yet, use LEA whenever possible. If you do use OFFSET with
groups, you need to go through the text file with a word search
to make sure that all OFFSETs have a group override. This is a
subtle error and it is very hard to find if you are not looking
for it.
This system is designed so that we can have 64k of data and
stack, all of which is addressable with DS without changing DS's
value. What happens if you have more data than that? One thing
for sure is that you don't have more than 64k of individually
named variables. Either that or you have some huge calluses on
your typing fingers.
What you do have is arrays. If you run into space problems, you
move the least used or the biggest arrays into their own
segments. You can have one segment per array if you want. The
standardized high-level language names for these segments is:
; - - - - - - - - - - - - - - - - - - - - -
FAR_DATA SEGMENT PARA 'FAR_DATA'
FAR_DATA ENDP
; - - - - - - - - - - - - - - - - - - - - -
FAR_BSS SEGMENT PARA 'FAR_BSS'
FAR_BSS ENDP
; - - - - - - - - - - - - - - - - - - - - -
Once again, the '_DATA' is for initialized data while the '_BSS'
is for uninitialized data. Use only the 'FAR_DATA' kind.{2} You
will notice that these segments are NOT PUBLIC. Although an
assembler will unify all segments with the same definition that
are in the same file, the linker will not unify segments from
different files which are not PUBLIC. If we create 4 different
.ASM files, each with one segment:
; FARDATA1.ASM
PUBLIC data1
; - - - - - - - - - - - - - - - - - - - - -
FAR_DATA SEGMENT PARA 'FAR_DATA'
data1 db 1A67h dup (0)
FAR_DATA ENDS
; - - - - - - - - - - - - - - - - - - - - -
; FARDATA2.ASM
PUBLIC data2
____________________
2. A high-level language has the right to set all the data of
a 'BSS' segment to zero as part of its startup routine. Whether
it does so or not depends on what it has told the linker. If you
put initialized data into either a '_BSS' or a 'FAR_BSS' segment,
it might easily wind up zero after startup.
The PC Assembler Tutor 286
______________________
; - - - - - - - - - - - - - - - - - - - - -
FAR_DATA SEGMENT PARA 'FAR_DATA'
data2 db 0D4A8h dup (0)
FAR_DATA ENDS
; - - - - - - - - - - - - - - - - - - - - -
FARDATA3.ASM
PUBLIC data3
; - - - - - - - - - - - - - - - - - - - - -
FAR_DATA SEGMENT PARA 'FAR_DATA'
data3 db 200h dup (0)
FAR_DATA ENDS
; - - - - - - - - - - - - - - - - - - - - -
FARDATA4.ASM
PUBLIC data4
; - - - - - - - - - - - - - - - - - - - - -
FAR_DATA SEGMENT PARA 'FAR_DATA'
data4 db 8716h dup (0)
FAR_DATA ENDS
; - - - - - - - - - - - - - - - - - - - - -
and link these with TEMPLATE.OBJ and ASMHELP, we will get the
following .MAP file:
Start Stop Length Name Class
00000H 01A66H 01A67H FAR_DATA FAR_DATA
01A70H 0EF17H 0D4A8H FAR_DATA FAR_DATA
0EF20H 0F11FH 00200H FAR_DATA FAR_DATA
0F120H 17835H 08716H FAR_DATA FAR_DATA
17840H 1823FH 00A00H STACKSEG STACK
18240H 1875DH 0051EH DATASTUFF DATA
18760H 1A02FH 018D0H CODESTUFF CODE
Program entry point at 1876:0000
The numbers in the segment definitions were in hex so you could
read the .MAP file more easily. We have created four different
FAR_DATAs - one for each variable.
The idea here is to leave DS alone if possible and use ES:SI or
ES:DI for your manipulation of the array.
mov ax, seg data1
mov es, ax
mov si, offset data1
Of course, if you using two different FAR_DATA arrays from two
different segments at the same time, you will probably need to
use DS temporarily. This is the kind of thing you need to plan
before you start a program which contains large arrays.
Chapter 26 - Simplifying The Template 287
_____________________________________
You have now seen all possible segments for any Microsoft
language and for Turbo C.{3} These are:
_DATA SEGMENT WORD PUBLIC 'DATA'
CONST SEGMENT WORD PUBLIC 'CONST'
_BSS SEGMENT WORD PUBLIC 'BSS'
STACK SEGMENT PARA STACK 'STACK'
_TEXT SEGMENT WORD PUBLIC 'CODE'
name_TEXT SEGMENT WORD PUBLIC 'CODE'
FAR_DATA SEGMENT PARA 'FAR_DATA'
FAR_BSS SEGMENT PARA 'FAR_BSS'
DGROUP GROUP _DATA, CONST, _BSS, STACK
We have another problem on our road to simplification. We want DS
to have the address of the start of DGROUP. How do we do it?
Well, before we had:
mov ax, DATASTUFF
mov ds, ax
DATASTUFF was a segment. We do the same thing for groups:
mov ax, DGROUP
mov ds, ax
We use a group name instead of a segment name. This means that
our ultimate code segment will look like this
; - - - - - - - - - -
_TEXT SEGMENT WORD PUBLIC 'CODE'
DGROUP GROUP _DATA, CONST, _BSS, STACK
ASSUME cs:_TEXT, ds:DGROUP
start:
mov ax, DGROUP
mov ds, ax
; - - - - - - - - - -
; the program goes here
; - - - - - - - - - -
mov ah, 4Ch
____________________
3. If you are using Turbo PASCAL, then there are only two
segments possible. They are:
DATA SEGMENT WORD PUBLIC
CODE SEGMENT BYTE PUBLIC
There is no class name. You can substitute DSEG for DATA and CSEG
for CODE if you want. Turbo Pascal has no DGROUP.
The PC Assembler Tutor 288
______________________
mov al, ? ; replace ? with error code
int 21h
_TEXT ENDS
; - - - - - - - - - -
Say, if all this stuff is standardized text, why are we forced to
type all this drivel over and over again. The answer is that we
aren't. All the segment information has a shorthand. Here's how
it works. Every shorthand symbol starts with a dot. The assembler
will then generate the desired text.{4} This is from MASM 5.0 on,
so if you have an earlier assembler you'll have to write the full
text.
To start out, use the two starting directives DOSSEG (with no
dot) and .MODEL. MODEL will be explained later.{5}
DOSSEG
.MODEL Medium
For now, 'medium' is what we want.
From that point, if you want a data segment, you just write
.DATA, if you want code, you write .CODE. Every time that the
assembler sees a segment directive it will close any segment that
is open and start the segment indicated by the directive. (You
can always reopen a segment). Here is what replaces the
directives:
DIRECTIVE REPLACEMENT TEXT
.DATA _DATA SEGMENT WORD PUBLIC 'DATA'
.CONST CONST SEGMENT WORD PUBLIC 'CONST'
.DATA? _BSS SEGMENT WORD PUBLIC 'BSS'
.STACK [size] STACK SEGMENT PARA STACK 'STACK'
.CODE _TEXT SEGMENT WORD PUBLIC 'CODE'
.CODE [name] name_TEXT SEGMENT WORD PUBLIC 'CODE'
.FARDATA [name] FAR_DATA SEGMENT PARA 'FAR_DATA'
.FARDATA? [name] FAR_BSS SEGMENT PARA 'FAR_BSS'
The [name] in brackets will be explained in a minute. The [size]
after the stack declaration allows you to customize the size of
the stack. Without any size, the declaration
.STACK
will allocate 1k of memory for the stack. A size allocates a
____________________
4. It really generates no text. It is just that the assembler
will generate the same machine code as if that text had been
generated.
5. DOSSEG tells the assembler to tell the linker that the .EXE
file should have the standard segment order. It is not necessary
but it doesn't hurt.
Chapter 26 - Simplifying The Template 289
_____________________________________
specific number of bytes:
.STACK 2000h
You can make it anything you want, but make sure it is an even
number and remember that the limit for all four parts of DGROUP
is 64k.
To see how the names work, we need some text files. Here is a
complete main file:
; FARDATA.ASM - driver module
DOSSEG
.MODEL medium
EXTRN data2_routine:FAR, data3_routine:FAR
.STACK 200h
.FARDATA
data1 db 0100h dup (0)
.CODE
main:
mov ax, DGROUP
mov ds, ax
call data2_routine
call data3_routine
mov ax, 4C00h
int 21h
END main
It has some data and some code though it doesn't really do
anything. We will use this along with two other files for the
examples. Here is FARDATA2:
; FARDATA2.ASM
DOSSEG
.MODEL medium
PUBLIC data2_routine
.FARDATA
data2 db 0200h dup (0)
.CODE
data2_routine proc
ret
data2_routine endp
END
Notice that data2_routine doesn't have a FAR or NEAR. That's
being taken care of by the memory model. Data2_routine's type
does need to be declared EXTRN in the main module. The third
routine has similar code. Here is the .MAP file when they are
combined:
Start Stop Length Name Class
00000H 00013H 00014H FARDATA_TEXT CODE
00014H 00014H 00001H FARDATA2_TEXT CODE
00016H 00016H 00001H FARDATA3_TEXT CODE
00020H 0011FH 00100H FAR_DATA FAR_DATA
00120H 0031FH 00200H FAR_DATA FAR_DATA
The PC Assembler Tutor 290
______________________
00320H 0061FH 00300H FAR_DATA FAR_DATA
00620H 00620H 00000H _DATA DATA
00620H 0081FH 00200H STACK STACK
You can see the FAR_DATAs there, but where did the FARDATA3_TXT
come from? The assembler decided that we wanted independent code
segments and gave each one the name of the assembler file it came
from. Since all the object files in a program must have unique
names, these segment names should also be unique. If we change
the .MODEL from MEDIUM to COMPACT without touching anything else,
then we get:
Start Stop Length Name Class
00000H 00022H 00023H _TEXT CODE
00030H 0012FH 00100H FAR_DATA FAR_DATA
00130H 0032FH 00200H FAR_DATA FAR_DATA
00330H 0062FH 00300H FAR_DATA FAR_DATA
00630H 00630H 00000H _DATA DATA
00630H 0082FH 00200H STACK STACK
If we now put a name after the .FARDATA directive, it will give
the segment a unique name. Putting:
.FARDATA jake_the_snake
in FARDATA2.ASM, along with name changes in the other modules
results in the following .MAP file:
Start Stop Length Name Class
00000H 00022H 00023H _TEXT CODE
00030H 0012FH 00100H HACKSAW FAR_DATA
00130H 0032FH 00200H JAKE_THE_SNAKE FAR_DATA
00330H 0062FH 00300H HULKSTER FAR_DATA
00630H 00630H 00000H _DATA DATA
00630H 0082FH 00200H STACK STACK
We are doing a number of interrelated things here, so let's try
to unify what is going on. You have seen both NEAR and FAR
routines in the Tutor. A NEAR routine alters IP and restores IP
on the return. A FAR routine alters both CS and IP and restores
them on the return.
When we passed addresses of data, we have almost always passed
just the offset of the data. That is because the data has almost
always been in the DATASTUFF SEGMENT, and the value of DS has
been known. In Chapter 19 we did "move_pascal_string" which was a
subroutine where we passed both the segment and offset of the
data. These are our two choices for passing addresses:
OFFSET 1 word
SEGMENT:OFFSET 2 words
Chapter 26 - Simplifying The Template 291
_____________________________________
This gives us four basic possiblilities for program structure:
SUBROUTINE CALL DATA ADDRESSES PASSED AS
NEAR OFFSET
FAR OFFSET
NEAR SEGMENT:OFFSET
FAR SEGMENT:OFFSET
Each of these structural possibilities has a name called a MODEL
name. They are:
SUBROUTINE CALL ADDRESSES PASSED AS MODEL NAME
NEAR OFFSET SMALL
FAR OFFSET MEDIUM
NEAR SEGMENT:OFFSET COMPACT
FAR SEGMENT:OFFSET LARGE
You tell the assembler which model you are working with by using
the .MODEL directive:
.MODEL medium
The assembler will then make either NEAR or FAR the default type.
This can be overridden if you have explicitly given a NEAR or
FAR:
my_proc procedure
will generate the correct subroutine calls and returns for that
model, while:
my_proc1 procedure near
my_proc2 procedure far
will remain unaltered.
At the assembler level, you need to code the address passing
yourself, but if you have a MODEL and you are connected to a
high-level language (with the same .MODEL type), the high-level
language will pass all addresses as stated above.
The advantage of this system is that using the .MODEL directive
and appropriate EQU statements and MACROS (which we have not
covered), it is possible to write a single subroutine which can
then be assembled in all four model configurations. Coding this
is non-trivial, but when you have done more programming you will
see how to deal with the stack using EQUs and MACROs.
For now, you want to stay with data addresses which are passed by
offset only. This is much easier. These are the SMALL and MEDIUM
models. Whether you choose NEAR or FAR procedures doesn't affect
much except where parameters are on the stack (because of that
extra CS).
The PC Assembler Tutor 292
______________________
Are these .MODELS important? They are nice, but not particularly
vital. What happens is that in a manual you see a sample program
like this:
DOSSEG
.MODEL medium
.STACK
.DATA
variable1 dw 25
.CODE
sample proc
mov ax, variable1
ret
sample endp
ENDS
END
and you start comparing the size of this to the size it would be
if you used the standard segment definitions. I have news for
you. This is not a legitimate program. A legitimate program is a
page or two long.{6} Also, at least to my way of thinking, you
want visual separation between segments. The above is a
disordered presentation of segments. We want order in our
programs and the segment headers provide a visual structure. In
the text file for ASMHELP, (which is about 3600 lines long), the
SEGMENT declarations occupy about 20 lines. This is about 0.5% of
the total length of the file.
If you are going to assemble a file in multiple models, then it
is worthwhile to use the .MODEL directives, otherwise it is
optional depending more on your concept of what looks clear than
any major difference.
____________________
6. Perhaps you want to scan \COMMENTS\MISHMASH.DOC which
contains some real subroutines. They are all long.
Chapter 26 - Simplifying The Template 293
_____________________________________
SUMMARY
To exit a program, use INT 21h Function 4Ch
mov ah, 4Ch ; exit program
mov al, ? ; replace ? with error code
int 21h
A GROUP is a group of segments whose data will be referenced by
the offset from the beginning of the group. You declare a group
with:
DGROUP GROUP _DATA, CONST, _BSS, STACK
MASM calculates OFFSETS incorectly with groups, so you should
either use LEA or the DGROUP override:
lea ax, variable1
mov ax, offset DGROUP:variable1
To get the address of DGROUP in DS you need:
ASSUME ds:DGROUP
and:
mov ax, DGROUP
mov ds, ax
The standardized segment definitions, along with their simplified
directives are:
DIRECTIVE REPLACEMENT TEXT
.DATA _DATA SEGMENT WORD PUBLIC 'DATA'
.CONST CONST SEGMENT WORD PUBLIC 'CONST'
.DATA? _BSS SEGMENT WORD PUBLIC 'BSS'
.STACK [size] STACK SEGMENT PARA STACK 'STACK'
.CODE _TEXT SEGMENT WORD PUBLIC 'CODE'
.CODE [name] name_TEXT SEGMENT WORD PUBLIC 'CODE'
.FARDATA [name] FAR_DATA SEGMENT PARA 'FAR_DATA'
.FARDATA? [name] FAR_BSS SEGMENT PARA 'FAR_BSS'
In addition you have the different model names:
SUBROUTINE CALL ADDRESSES PASSED AS .MODEL NAME
NEAR OFFSET SMALL
FAR OFFSET MEDIUM
NEAR SEGMENT:OFFSET COMPACT
FAR SEGMENT:OFFSET LARGE