home *** CD-ROM | disk | FTP | other *** search
- RELATIVE FILES MADE RELATIVELY EASY Part 1
-
- by Bill Brier
-
-
- This is part one of a three part article explaining the use of RELative
- files in database programs.
-
- The article assumes some familiarity with BASIC and the disk drive.
-
-
-
-
-
- I. WHAT IS A RELATIVE FILE?
-
- At one time or another you have probably used a commercial database program
- to store and retrieve information on a disk. And you have probably looked
- at the file disk directory on at least one occasion and saw a file entry
- marked with REL. You knew that the RELative file was the backbone of the
- whole database but it seemed like a mysterious, unfathomable storage
- structure. Well, this article will help you to understand and use RELative
- files and will explain some easy-to-use techniques for creating your own
- RELative file database programs.
-
- A RELative file is a special type of SEQuential disk file that is
- structured into logical data units called RECORDS. Each record is assigned
- a number. These records, in turn, are structured into data units called
- FIELDS. Each field is assigned a numerical position within the record.
- Because of this unique structure, the RELative file allows the programmer to
- access just one record and, if desired, one field of that record. This
- feature makes the design of fast searching databases easy and efficient.
-
- A RELative file is the direct descendant of an older and more complex filing
- system refered to as DIRECT-ACCESS or RANDOM ACCESS. Such a system required
- that the program manage the use of the tracks and sectors on the disk, a
- complex and somewhat inefficient task. When using RELative files, the disk
- operating system (DOS) handles the track and sector usage and relieves the
- program of this task. The Commodore DOS does all of the "dirty work" for
- you, leaving you with a less daunting task to accomplish.
-
- Because of the random access nature of a RELative file a bit of planning is
- required on the programmer's part. Let's take a look at this aspect first.
- The rest will then become considerably easier to understand.
-
- II. RECORDS and FIELDS
-
- The job of any RELative file is to store information in a form that can be
- readily used some time in the future. To do this, the file must be properly
- structured to avoid a confusing mess. Thus, the individual fields and their
- use within a record must be understood. We will use a mailing list as a
- simple and easily understood example.
-
- A record in our mailing list would consist of the following fields:
-
- First/last name
- Street address
- City
- State
- Zip code
- Area code
- Telephone number
-
- The reason for separating these items into individual fields is so that the
- program will be able to search on them. For example, the user may have a
- need to search out certain area codes. By having the area code in a
- separate field such a search can be easily and quickly performed.
-
- Once you have defined the fields that you wish to use you must next
- determine the number of characters each field will be allowed to contain.
- It is in this area that many beginners encounter difficulty. Here are
- representative lengths for the above fields:
-
- First/last name: 33
- Street address: 26
- City: 19
- State: 3
- Zip code: 6
- Area code: 4
- Telephone number: 9
-
- Before you start to write your program construct a table just like the one
- above and work out the field lengths. Then, add one extra character to each
- field. This extra character will be the field DELIMITER (normally a
- carriage return). The delimiter is the means by which your program will be
- able to determine where one field ends and the next one starts. The above
- table includes a delimiter in each length. Now, add up all of the lengths.
- If you've copied and added up my table you should arrive with 100 as a
- total. This is the RECORD LENGTH in bytes. A record may not exceed 254
- bytes total length including the delimiters. If your program must use a
- greater length then you will have to use a second record to store the
- information that will not fit into the first one. Avoid this type of
- situation if at all possible.
-
- With the field size and structure determined you now need to determine the
- number of records that your file should contain. You should be realistic
- about this as there is no point in structuring a file much larger than what
- is needed to do the job. Also, there is only so much storage space on a
- disk and room must be left for other uses which will be discussed later.
-
- In our mailing list program we will create a file with 500 records. To
- determine the approximate number of bytes that this file will require and
- the number of disk sectors (blocks) it will use multiply the record length
- in bytes by the number of records used and add five percent to this total.
- That will give you the storage usage of the RELative file in bytes. Divide
- that usage by 256 to get the blocks used. In our example file, we would
- figure as follows:
-
- 100 * 500 + 5% = 52500 bytes
- 52500 / 256 = 205 blocks used
-
- The five percent add-on covers disk overhead needed to maintain the RELative
- file. This overhead is called the SIDE-SECTOR CHAIN OF BLOCKS. It is an
- internal "roadmap" maintained on the disk by the drive so it can find the
- various records without a lot of lost time. Since an empty 1541-formatted
- disk has 664 free blocks, we know that our file will fit without any
- problem.
-
- We need to do two other things and then we'll be ready to start programming.
- Each of the fields needs to have a POSITION number and a string variable
- assigned to it. The position or P number tells the drive which field in the
- record that the data is to be located. The string variable obviously will
- be used to store the data. The P number is calculated by adding the length
- of a field to its starting position in the record, the result which becomes
- the P number for the next field. Using our already defined fields, let's
- set up a table listing the assigned variables and P numbers:
-
- Field Len. Var. P
- ===================================
- 1st/last name 33 NA$ 1
- Street address 26 AD$ 34
- City 19 CT$ 60
- State 3 SA$ 79
- Zip code 6 ZI$ 82
- Area Code 4 AC$ 88
- Telephone number 9 TN$ 92
- ===================================
-
- If you study this table for a while a pattern will emerge. The P number for
- a given field (except the first one of course) is the length of the previous
- field plus the P number for that previous field. For example, the P number
- for the address field (AD$) is 33 (the length of NA$) plus 1 (the field
- position of NA$) or 34. The record always starts at position or byte number
- 1 and works up from there. Note that in calculating the P number the length
- of the field also includes the extra character for the delimiter. If you
- neglect to leave room for it two fields may run together and create a mess.
-
- But, you say, wait a minute! The name or city or address lengths will vary
- depending on the actual information entered by the user. Yes, that is true
- but is nothing to be concerned about. If your data string is shorter than
- the maximum length allowed by the field the unused portion of the field will
- simply be along for the ride. The delimiter can be at any point within the
- field. The only thing you must be careful of is to avoid a data string
- length greater than the field length. Such an occurence would result in two
- fields running together or an overflow in the record.
-
- III. CREATING A NEW RELATIVE FILE
-
- When your program first starts out there won't be a RELative file to work
- with. Therefore you must create one on the file disk. Unlike SEQuential
- files, RELative files make no distinction between reading and writing to a
- file. A RELative file can be OPENed, written to, read from and CLOSEd
- without any special gymnastics on the part of your program. You do need to
- create a file full of "dummy" records to start with, though. This file
- initially won't contain any data but will contain all of the records you
- will need to store your data in. Before we create our new file however we
- must become acquainted with some programming techniques required to
- manipulate a RELative file.
-
- When reading or writing a RELative file you must have two disk files
- simultaneously OPENed. One of them will, of course, be the file associated
- with the RELative file. The other will be for communications on the error
- channel (channel 15). For our purposes we will use file #1 for the error
- channel and file #2 for the RELative file itself. The data being stored or
- retrieved in the RELative file will be passed using file #2 and DOS error
- messages and commands will be passed using file #1. Also, you must always
- OPEN the error channel first. With that settled, let's OPEN our new file:
-
- OPEN1,8,15:OPEN2,8,2,"0:MAILING LIST,L,"+CHR$(100)
-
- This sequence OPENs the error channel using file #1 and opens the RELative
- file called "MAILING LIST" using file #2. Note the syntax. The '0:'
- specifies drive 0. You must ALWAYS specify the drive number even on a
- single drive. The ',L,' tells the DOS that this is a RELative file and the
- '+CHR$(100)' tells the DOS that the record length of the file is 100 bytes.
- Now I think you can see why we constructed those tables earlier. You needed
- that information to tell the drive the size of the file records.
-
- This OPENs the file but doesn't really do anything else. We still need to
- create 500 empty records for later use. To do that we must POSITION the
- drive to the last or highest record in the file and write something to that
- record. This action will cause the DOS to create all 500 records and
- allocate space on the disk for the whole file. While it is not absolutely
- necessary that you do this, the drive will handle the file much faster if
- all of the records are created in advance. Otherwise, if you position to a
- record that isn't present the drive has to create it and any records in
- between, a time-consuming process.
-
- The position command is passed to the drive through the error channel. This
- is why the error channel must be opened while using a RELative file. Here
- is the syntax for the position command (assuming that file #1 is the error
- channel and file #2 is the RELative file):
-
- PRINT#1,"P"CHR$(98)CHR$(RL)CHR$(RH)CHR$(P)
-
- Let's take a closer look at this sequence. PRINT#1 sends the following
- commands through the error channel to the DOS. The "P" tells the DOS that
- this is a RELative file position command. CHR$(98) is the RELative file
- number (#2 in this case) plus 96. CHR$(RL) and CHR$(RH) is the record
- number in low byte/high byte format. It is figured as follows (the variable
- J is assumed to be the record number):
-
- RH=INT(J/256):RL=J-RH*256
-
- Since we are creating a new file with 500 records the variable J will be set
- to equal 500 (J=500). Using the little equation above a value of 500 for J
- will result in RL=244 and RH=1. Try it!
-
- CHR$(P) is the field position within the selected record. Note that in our
- table we constructed earlier, the variable P is used as the position number
- for a given field. Does this make a little more sense now? For the
- purposes of creating the new file P should be set equal to 1 (P=1). With
- the appropriate values set we can then send the position command to the
- drive.
-
- Arrgghh!!! What's this??? The drive error light just started flashing.
- Now what???
-
- Immediately after sending the position command you must read the error
- channel for any possible error messages that the drive may send back:
-
- INPUT#1,X,X$,Y,Z
-
- The variable X will contain the error number if any, X$ will contain a plain
- English error message, and Y and Z will contain the track and sector number
- where the error occured (if applicable). In our example, the drive will
- return 50, RECORD NOT PRESENT. Why, you say? Well, because record 500
- ISN'T PRESENT! It's a new file, remember? Well, what do we do?
-
- To clear the error condition we must write something to record 500. This
- will force the DOS to create records 1 through 499 as well as 500. To write
- to the 500th record use this syntax:
-
- PRINT#2,"END OF FILE"
-
- The drive will run for several minutes and you'll hear all sorts of wild
- activity going on inside. That's the drive head whipping back and forth as
- it creates the side sectors and continuously updates the BAM (block
- allocation map). When all of that activity is done your new file will be
- ready to go! Incidentally, you don't have to use "END OF FILE" as the
- string to write to the last record. Anything will do.
-
- Following the PRINT#2 sequence you should read the error channel once more
- to verify that the drive isn't in trouble again. If a problem does occur it
- will most likely be error 52, FILE TOO LARGE. This will occur if
- insufficient disk space is available to accomodate the file. The reason for
- calculating the estimated file size becomes a bit more obvious now, doesn't
- it?
-
- Finally you must close up your files, RELative file first followed by the
- error channel:
-
- CLOSE2:CLOSE1
-
- The above sequence of operations that we have just performed only has to be
- used once. After the file has been created we can simply OPEN and read
- and/or write to it without having to specify record length. It is still
- necessary to position the drive however. We will give that aspect of using
- the file further consideration when we start to actually read and write
- data.
-
- We now have a RELative file on our disk with 500 empty records ready to
- accept data. Before we can start using these records let's set up a few
- useful subroutines and variables for our program to work with.
- IV. SUBROUTINES
-
- There are certain operations that will be repeatedly used within your
- program. For that reason it is wise to structure them as subroutines to
- facilitate the execution of your program. These operations are: positioning
- the drive, reading the error channel, OPENing files and CLOSEing files.
- Also, the read and write operations are usually more convenient if
- structured as subroutines.
-
- It is also convenient to use certain variable names to represent various
- pieces of information that will be floating around in your program. Here
- are the variables and their functions that I use:
-
- VAR. FUNCTION
- =======================================
- I RECORD INDEX LOOP VARIABLE
- J RECORD NUMBER
- P FIELD POSITION IN RECORD
- X DOS ERROR NUMBER
- X$ DOS ERROR MESSAGE
- EF ERROR FLAG...1=ERROR 0=NO ERROR
- F MULTI-PURPOSE LOOP VARIABLE
- Q CONSTANT EQUAL TO 256
- =======================================
-
- As we further progress into more advanced file handling we will see where
- all of these variables will become useful. You of course can use any
- variable names that you like.
-
- In my RELative file programs I generally use certain line numbers in
- connection with the various subroutines required to manipulate the file.
- Here is a list of these line numbers:
-
- LINE FUNCTION
- =======================================
- 6900 OPEN ERROR CHANNEL
- 7000 OPEN EXISTING RELATIVE FILE
- 7100 CLOSE ALL FILES
- 8100 POSITION DRIVE TO RECORD & FIELD
- 8200 READ ERROR CHANNEL
- 8300 WRITE TO RECORD
- 8400 READ FROM RECORD
- =======================================
-
- You may find that other line numbers may be more convenient. In all of the
- discussions that follow from this point on the above variables and line
- numbers will be refered to.
-
- Here are the recommended syntaxes for these lines except lines 8300 through
- 8499 (we will continue to use the file numbers that we've used up til now):
-
- 6900 OPEN1,8,15:RETURN
-
- 7000 OPEN2,8,2,"0:MAILING LIST":GOSUB8200:RETURN
-
- 8100 CLOSE2:CLOSE1:RETURN
-
- 8100 RH=INT(J/Q):RL=J-RH*Q
- 8110 PRINT#1,"P"CHR$(98)CHR$(RL)CHR$(RH)CHR$(P)
-
- 8200 EF=0:INPUT#1,X,X$:IFX>19THENEF=1
- 8210 RETURN
-
- Observe that the routine for positioning the drive "falls through" to the
- routine that reads the error channel. This is simply a matter of
- programming convenience. Also notice the syntax in line 7000. Once the
- RELative file has been created your program does not need to specify the
- record length. The DOS will automatically figure it out by itself. You
- must always check the error channel after OPENing the file however, just in
- case it was accidentally SCRATCHed from the disk.
-
- In line 8100 we first calculate the record low byte/high byte values and
- then use them as variables to position the drive. The constant Q represents
- the value 256, a value that is frequently used in RELative file programs. A
- BASIC statement will always evaluate faster using a variable than it will
- using the ASCII representation of the number.
-
- In 8200 we first clear the error flag EF and then read the error channel.
- If the value of X is greater than 19 an error of some type has occured and
- the error flag EF is set to one. It is easier and faster to test EF with an
- expression such as IF EF THEN.... than to evaluate X every time an error
- check is needed.
-
- Well, digest this information and do a little experimenting. In RELATIVE
- FILES MADE RELATIVELY EASY part 2 we'll learn how to actually read and write
- the records and how do do field searches.
-
- W.J. Brier
- Feb. 1986
-
-