home *** CD-ROM | disk | FTP | other *** search
- CP/M Assembly Language
- Part I: Assembler Basics
- by Eric Meyer
-
- I first discovered this about two years ago, when I needed
- to modify the source code for a public domain modem program for
- an unusual application.
- Since then, I've gone on to write a number of programs in
- assembler, ranging from some simple public domain utilities to
- the memory resident utility PRESTO!.
- For many such applications, assembler is the language of
- choice: it's very compact and fast; it's the most efficient way
- to do simple tasks that deal with moving around bytes of data,
- such as copying and modifying files; and it allows the most
- sophisticated interfacing with the CP/M operating system, which
- is itself written in assembler.
- Another nice thing is that you already have all the tools
- that you need to learn and use assembly language: nothing more to
- buy, unless your needs grow to be very sophisticated.
- CP/M 2.2 includes the ASM assembler; CP/M 3.0 comes with MAC
- and RMAC. All you lack is instructions. Let me quickly mention
- two good books on the subject: CP/M Assembly Language
- Programming, and The Soul of CP/M. Both, while not complete
- language references, put a lot of emphasis on programming in the
- CP/M environment, which will have you doing truly useful things
- (like manipulating disk files) in short order.
- Both are far more comprehensive than I can attempt to be
- here; I will just just present an introduction, and explain some
- basic concepts for those who would like to become literate in
- assembler.
- Numbers play an important role in all that follows.
- Basically, everything in the computer is (or is represented as)
- numbers -- such as the instructions that make up a program, or
- the operating system itself; characters of text and other data
- that you may be manipulating; addresses in memory where various
- data or subroutines can be found; and so on.
- Only the context determines whether a particular value is to
- be interpreted as a number, an ASCII character, part of an
- address, or a machine instruction.
- This can be very powerful, but it's also potentially very
- confusing. (Pascal aficionados may need a strong drink before
- proceeding.)
- All numbers in what follows are decimal, unless followed by
- a "H" (for Hexadecimal, base 16) or "B" (for Binary, base 2).
- Hexadecimal is commonly used in assembly language programming, as
- it's the most natural representation for the numbers from 1 to
- 255 (or 65535) that your computer manipulates on the most
- fundamental level.
- If you're unfamiliar with these base systems, you may want
- to find or make a conversion chart for reference.
-
-
- 1. The CPU
- The CPU (central processing unit) is the integrated circuit
- at the heart of your computer. It fetches your instructions,
- executes them, and keeps track in the meantime (via "interrupts")
- of all the other tasks your computer needs to have done.
- Most CP/M computers today use the Z80 CPU, though some still
- use the 8080 (or 8085), which are very similar but don't have
- quite as many instructions.
- These "8-bit" CPUs deal primarily with "bytes", numeric
- values from 0 to 255 (11111111B, or FFH); though two bytes
- together can also be used as a 16-bit "word", a value from 0 to
- 65535 (FFFFH).
- In this manner, up to 64K (64 times 1024, or 65536, bytes)
- of memory can be addressed. Part of this memory will be holding
- the CP/M operating system; part will contain the transient
- program that is actually running at the moment; and part will
- remain available as data storage space for that program.
-
-
- 2. Assembly Language
- The CPU has a moderate number of "instructions", each of
- which performs some simple but useful task: adding two values,
- fetching a byte of data from memory, and so on. Each instruction
- is "coded" by one (or possibly several) bytes, according to an
- arbitrary system. For example, C9H (201) is the "return"
- instruction, which marks the end of a subroutine.
- On the earliest microcomputers, programs were entered as a
- series of such numbers, often with a row of eight mechanical
- switches: thus the sequence "on, on, off, off, on, off, off, on"
- would represent 11001001B, or C9H.
- This was incredibly tedious. Today, having plenty of memory
- available to work with, you can write assembly language like any
- other language, using an editor to create a text file; a special
- program, the assembler, will translate the statements you write
- (e.g., the mnemonic "RET" for return) into the appropriate
- machine code.
- The assembler functions very much like a compiler for a
- higher-level language. The difference is that a language compiler
- will incorporate prewritten library routines to perform many
- common tasks, and allows you to do very complex things with just
- a few statements. Thus when you write something like:
-
- 100 INPUT "DIAMETER:",D
- 110 PRINT "CIRCUMFERENCE IS:",3.14159*D
-
- you are actually invoking a whole set of routines (part of your
- BASIC interpreter or compiler) that prints messages on the
- screen, gets input from the keyboard, stores and retrieves data
- values in memory, performs floating point arithmetic, and so on.
- When you program in assembler, you have to write every
- single CPU instruction yourself. This can be a lot of work, since
- the CPU can basically do two things: move a byte from one place
- to another; and add, subtract, and do logical operations like
- "and" and "or" with byte values from 1 to 255.
- Are you wondering how you would do floating point
- multiplication (C=3.14159*D) using an instruction set so
- primitive that it can only add and subtract integers from 0 to
- 255? The answer is that if you are sane, you wouldn't. There are
- tasks well suited to assembly language, and others best done in
- higher level languages. (Somebody has already written the
- floating point code that's part of your BASIC interpreter; take
- advantage of it.)
- In assembler, stick to fundamentally lower level tasks, such
- as talking to your computer hardware (like memory and I/O ports),
- and manipulating disk files with the CP/M BDOS calls. For these
- purposes there is no better "language".
-
-
- 3. The Assembler
- There are several common assemblers, but they all work in
- similar ways. CP/M 2.2's ASM is a good example of a basic 8080
- assembler. MAC is a macro assembler, meaning that it lets you
- designate frequently-used blocks of code as "macros", and invoke
- them with a single name, much as you would a function call in
- another language -- this is just a convenience.
- RMAC is a relocatable macro assembler, meaning that it can
- produce output in a format that can be installed to run in
- different parts of memory as circumstances require; the usual
- assembler output is code intended to run only at address 0100H,
- the beginning of the TPA (transient program area) under CP/M.
- (This is not something you are going to need to worry about at
- first.)
- Many commercial assemblers are also available, such as
- Microsoft's M80. Generally these are even more powerful, and
- frequently they can also take advantage of the expanded
- instruction set of the Z80 CPU.
- My personal favorites are SLR Systems' SLRMAC (8080) and
- Z80ASM, both of which are incredibly fast relocatable assemblers,
- and can also generate COM files directly. But unless you get as
- heavily involved in assembly language as I have recently, it
- won't much matter which you use. The common procedure is:
-
- 1) Write the source code with your favorite text editor.
- 2) Run the assembler, typically producing a HEX output file.
- 3) Generate an executable (COM) file from the HEX file.
-
- The first step will require learning the assembler
- instruction set. The second is usually as easy as typing A>ASM
- PROG<cr>; see your computer documentation for (probably minimal)
- instructions on assembler usage. The third is done using the
- HEXCOM utility under CP/M 3.0, or LOAD and SAVE under CP/M 2.2
- (though a fine public domain utility called MLOAD is much easier
- than this combination).
-
-
- 4. Practical Tasks
- Before we get into real assembler programming, it's
- worthwhile to note that frequently, what you need to do is not
- actually to write a program from scratch, but simply to get an
- existing program running the way you want. Good public domain
- utilities, for example, often allow a number of features to be
- changed, to allow proper operation on different computers, or
- just to conform to different tastes.
- At the simplest level, the program's DOC file may just give
- a list of patching addresses. For example, the instructions for
- the (imaginary) XYZED text editor might include this information:
-
- ADDRESS VALUE
- 0130H create BAKup files? (00=no, FF=yes)
- 0131H copy buffer size in bytes (0...3000H)
-
- This indicates, for example, that you can get XYZED to
- create backup files or not, as you like, by changing a particular
- byte in the COM file. The easiest way to do this is to edit
- XYZED.COM with a utility like EDFILE, PATCH, or DU; find the
- value at address 0130H; and change it, if necessary, to what you
- wanted. That's all you have to do; XYZED must be designed to
- check the value it has at 0130H, and adjust its behavior
- accordingly.
- Sometimes the installation process can be more complex.
- Modem programs, for example, typically have to have very
- different basic routines to talk to the I/O hardware of different
- computers. Here there will often be a whole "overlay"; an
- assembler source file containing an actual listing of portions of
- the program.
- You will have to edit this file, then assemble it and merge
- it with the rest of the COM file. This can require knowledge of
- some basic assembly language, but sometimes it can also be as
- simple as changing data values.
- Let's begin by considering a handful of simple assembler
- directives. These are not actually CPU instructions at all; they
- are merely instructions to the assembler, regarding where to put
- code, and the insertion of data values. You will see these used
- frequently in overlay files.
-
-
- 5. Assembler Directives
-
- ORG (origin): tells the assembler the address in memory at which
- the following code, or data, should be put. Most programs,
- e.g., begin with "ORG 0100H", since transient CP/M programs
- load in at address 0100H, the beginning of the TPA.
-
- END: marks the end of an assembler source file.
-
- EQU (equate): assigns a numerical value to a label. This isn't a
- "variable", as its value cannot change, and it generates no
- output code; it's merely a convenience.
-
- DB, DW (define byte, define word): like the "DATA" statement in
- BASIC, instructs the assembler simply to insert the
- following numerical values at the current address in memory.
- Presumably the program is going to refer to them as data at
- some point.
-
-
- Consider the XYZED program again. Instead of merely giving a
- table of patch information to go by, as described above, it might
- have provided you with an overlay file XYZEDOV.ASM which would
- include the following instructions:
-
- ;XYZEDOV.ASM installation overlay
- YES EQU 0FFH
- NO EQU 0
- ORG 0130H
- BAKFLG: DB YES ;create BAK files?
- ; (yes or no)
- BUFSIZ: DW 0800H ;copy buffer size,
- ; in bytes
- END
-
- The semicolon ";", like REM in BASIC, indicates that the
- rest of the line is simply a comment, to be ignored by the
- assembler.
- The two EQUates tell the assembler to substitute the number
- FFH (255) everywhere "YES" occurs in what follows, and 0 for
- "NO".
- Not only is this convenient; it also makes the code more
- understandable, by making it clear that a value is logical
- (yes/no), rather than just an arbitrary number (like 255).
- This kind of thing always helps in assembly language, which
- is prone to be very confusing otherwise.
- The ORG statement tells the assembler that the following
- code or data is to be put starting at address 0130H in memory. In
- this case, XYZED.COM expects to find these data items at this
- address.
- The labels "BAKFLG:" and "BUFSIZ:" are just for the purpose
- of identification here, though in an actual program, labels can
- function as names for variables or subroutines, as we'll see
- later.
- The "DB YES" inserts one byte of data (in this case "YES",
- or FFH) at the current address (in this case 0130H, set by the
- ORG statement).
- The "DW 0800H" inserts a word (two bytes) of data at the
- current address (now 0131H, since the previous byte went at
- 0130H). In fact, two-byte values are stored "backwards" or low
- byte first, so the assembler is actually going to put the 00H at
- address 0131H, and then the 08H at 0132H. So this file has
- instructed the assembler to set up the following sequence of
- three data bytes:
-
- 9ADDRESS DATA
- 0130H FFH
- 0131H 00H
- 0132H 08H
-
- If you now assemble this file, with a command like
-
- A> asm xyzedov<cr>
-
- you will get an output file XYZEDOV.HEX which contains the HEX
- version of this code, a compact (though still ASCII text) format
- frequently used as an intermediary between source code and
- (unreadable) machine code. If you looked at the HEX file, you
- would see something like this:
-
- :03013000FF0008F5
-
- which can be read as "three bytes, starting at address 0130, as
- follows: FF, 00, 08". (The last value on the line is just a
- checksum byte for safety.)
- You can then use a utility like MLOAD to merge this HEX file
- with the program XYZED.COM itself:
-
- A> mload xyzed.com=xyzed.com,xyzedov.hex<cr>
-
- and you will have a new copy of the XYZED program, with the
- values changed accordingly.
-
-
- 6. Coming Up. . .
- In future installments we'll learn about the 8080 CPU and
- its instruction set, and explain how to use CP/M BDOS calls.