8bitfiles.net/archives

home *** CD-ROM | disk | FTP | other *** search

/ 8bitfiles.net/archives / archives.tar / archives / genie-commodore-file-library / Information / RELATIVELY-EASY-PART-1 < prev next >

Wrap

Text File | 2019-04-13 | 17.1 KB | 366 lines

RELATIVE FILES MADE RELATIVELY EASY Part 1 by Bill Brier This is part one of a three part article explaining the use of RELative files in database programs. The article assumes some familiarity with BASIC and the disk drive. I. WHAT IS A RELATIVE FILE? At one time or another you have probably used a commercial database program to store and retrieve information on a disk. And you have probably looked at the file disk directory on at least one occasion and saw a file entry marked with REL. You knew that the RELative file was the backbone of the whole database but it seemed like a mysterious, unfathomable storage structure. Well, this article will help you to understand and use RELative files and will explain some easy-to-use techniques for creating your own RELative file database programs. A RELative file is a special type of SEQuential disk file that is structured into logical data units called RECORDS. Each record is assigned a number. These records, in turn, are structured into data units called FIELDS. Each field is assigned a numerical position within the record. Because of this unique structure, the RELative file allows the programmer to access just one record and, if desired, one field of that record. This feature makes the design of fast searching databases easy and efficient. A RELative file is the direct descendant of an older and more complex filing system refered to as DIRECT-ACCESS or RANDOM ACCESS. Such a system required that the program manage the use of the tracks and sectors on the disk, a complex and somewhat inefficient task. When using RELative files, the disk operating system (DOS) handles the track and sector usage and relieves the program of this task. The Commodore DOS does all of the "dirty work" for you, leaving you with a less daunting task to accomplish. Because of the random access nature of a RELative file a bit of planning is required on the programmer's part. Let's take a look at this aspect first. The rest will then become considerably easier to understand. II. RECORDS and FIELDS The job of any RELative file is to store information in a form that can be readily used some time in the future. To do this, the file must be properly structured to avoid a confusing mess. Thus, the individual fields and their use within a record must be understood. We will use a mailing list as a simple and easily understood example. A record in our mailing list would consist of the following fields: First/last name Street address City State Zip code Area code Telephone number The reason for separating these items into individual fields is so that the program will be able to search on them. For example, the user may have a need to search out certain area codes. By having the area code in a separate field such a search can be easily and quickly performed. Once you have defined the fields that you wish to use you must next determine the number of characters each field will be allowed to contain. It is in this area that many beginners encounter difficulty. Here are representative lengths for the above fields: First/last name: 33 Street address: 26 City: 19 State: 3 Zip code: 6 Area code: 4 Telephone number: 9 Before you start to write your program construct a table just like the one above and work out the field lengths. Then, add one extra character to each field. This extra character will be the field DELIMITER (normally a carriage return). The delimiter is the means by which your program will be able to determine where one field ends and the next one starts. The above table includes a delimiter in each length. Now, add up all of the lengths. If you've copied and added up my table you should arrive with 100 as a total. This is the RECORD LENGTH in bytes. A record may not exceed 254 bytes total length including the delimiters. If your program must use a greater length then you will have to use a second record to store the information that will not fit into the first one. Avoid this type of situation if at all possible. With the field size and structure determined you now need to determine the number of records that your file should contain. You should be realistic about this as there is no point in structuring a file much larger than what is needed to do the job. Also, there is only so much storage space on a disk and room must be left for other uses which will be discussed later. In our mailing list program we will create a file with 500 records. To determine the approximate number of bytes that this file will require and the number of disk sectors (blocks) it will use multiply the record length in bytes by the number of records used and add five percent to this total. That will give you the storage usage of the RELative file in bytes. Divide that usage by 256 to get the blocks used. In our example file, we would figure as follows: 100 * 500 + 5% = 52500 bytes 52500 / 256 = 205 blocks used The five percent add-on covers disk overhead needed to maintain the RELative file. This overhead is called the SIDE-SECTOR CHAIN OF BLOCKS. It is an internal "roadmap" maintained on the disk by the drive so it can find the various records without a lot of lost time. Since an empty 1541-formatted disk has 664 free blocks, we know that our file will fit without any problem. We need to do two other things and then we'll be ready to start programming. Each of the fields needs to have a POSITION number and a string variable assigned to it. The position or P number tells the drive which field in the record that the data is to be located. The string variable obviously will be used to store the data. The P number is calculated by adding the length of a field to its starting position in the record, the result which becomes the P number for the next field. Using our already defined fields, let's set up a table listing the assigned variables and P numbers: Field Len. Var. P =================================== 1st/last name 33 NA$ 1 Street address 26 AD$ 34 City 19 CT$ 60 State 3 SA$ 79 Zip code 6 ZI$ 82 Area Code 4 AC$ 88 Telephone number 9 TN$ 92 =================================== If you study this table for a while a pattern will emerge. The P number for a given field (except the first one of course) is the length of the previous field plus the P number for that previous field. For example, the P number for the address field (AD$) is 33 (the length of NA$) plus 1 (the field position of NA$) or 34. The record always starts at position or byte number 1 and works up from there. Note that in calculating the P number the length of the field also includes the extra character for the delimiter. If you neglect to leave room for it two fields may run together and create a mess. But, you say, wait a minute! The name or city or address lengths will vary depending on the actual information entered by the user. Yes, that is true but is nothing to be concerned about. If your data string is shorter than the maximum length allowed by the field the unused portion of the field will simply be along for the ride. The delimiter can be at any point within the field. The only thing you must be careful of is to avoid a data string length greater than the field length. Such an occurence would result in two fields running together or an overflow in the record. III. CREATING A NEW RELATIVE FILE When your program first starts out there won't be a RELative file to work with. Therefore you must create one on the file disk. Unlike SEQuential files, RELative files make no distinction between reading and writing to a file. A RELative file can be OPENed, written to, read from and CLOSEd without any special gymnastics on the part of your program. You do need to create a file full of "dummy" records to start with, though. This file initially won't contain any data but will contain all of the records you will need to store your data in. Before we create our new file however we must become acquainted with some programming techniques required to manipulate a RELative file. When reading or writing a RELative file you must have two disk files simultaneously OPENed. One of them will, of course, be the file associated with the RELative file. The other will be for communications on the error channel (channel 15). For our purposes we will use file #1 for the error channel and file #2 for the RELative file itself. The data being stored or retrieved in the RELative file will be passed using file #2 and DOS error messages and commands will be passed using file #1. Also, you must always OPEN the error channel first. With that settled, let's OPEN our new file: OPEN1,8,15:OPEN2,8,2,"0:MAILING LIST,L,"+CHR$(100) This sequence OPENs the error channel using file #1 and opens the RELative file called "MAILING LIST" using file #2. Note the syntax. The '0:' specifies drive 0. You must ALWAYS specify the drive number even on a single drive. The ',L,' tells the DOS that this is a RELative file and the '+CHR$(100)' tells the DOS that the record length of the file is 100 bytes. Now I think you can see why we constructed those tables earlier. You needed that information to tell the drive the size of the file records. This OPENs the file but doesn't really do anything else. We still need to create 500 empty records for later use. To do that we must POSITION the drive to the last or highest record in the file and write something to that record. This action will cause the DOS to create all 500 records and allocate space on the disk for the whole file. While it is not absolutely necessary that you do this, the drive will handle the file much faster if all of the records are created in advance. Otherwise, if you position to a record that isn't present the drive has to create it and any records in between, a time-consuming process. The position command is passed to the drive through the error channel. This is why the error channel must be opened while using a RELative file. Here is the syntax for the position command (assuming that file #1 is the error channel and file #2 is the RELative file): PRINT#1,"P"CHR$(98)CHR$(RL)CHR$(RH)CHR$(P) Let's take a closer look at this sequence. PRINT#1 sends the following commands through the error channel to the DOS. The "P" tells the DOS that this is a RELative file position command. CHR$(98) is the RELative file number (#2 in this case) plus 96. CHR$(RL) and CHR$(RH) is the record number in low byte/high byte format. It is figured as follows (the variable J is assumed to be the record number): RH=INT(J/256):RL=J-RH*256 Since we are creating a new file with 500 records the variable J will be set to equal 500 (J=500). Using the little equation above a value of 500 for J will result in RL=244 and RH=1. Try it! CHR$(P) is the field position within the selected record. Note that in our table we constructed earlier, the variable P is used as the position number for a given field. Does this make a little more sense now? For the purposes of creating the new file P should be set equal to 1 (P=1). With the appropriate values set we can then send the position command to the drive. Arrgghh!!! What's this??? The drive error light just started flashing. Now what??? Immediately after sending the position command you must read the error channel for any possible error messages that the drive may send back: INPUT#1,X,X$,Y,Z The variable X will contain the error number if any, X$ will contain a plain English error message, and Y and Z will contain the track and sector number where the error occured (if applicable). In our example, the drive will return 50, RECORD NOT PRESENT. Why, you say? Well, because record 500 ISN'T PRESENT! It's a new file, remember? Well, what do we do? To clear the error condition we must write something to record 500. This will force the DOS to create records 1 through 499 as well as 500. To write to the 500th record use this syntax: PRINT#2,"END OF FILE" The drive will run for several minutes and you'll hear all sorts of wild activity going on inside. That's the drive head whipping back and forth as it creates the side sectors and continuously updates the BAM (block allocation map). When all of that activity is done your new file will be ready to go! Incidentally, you don't have to use "END OF FILE" as the string to write to the last record. Anything will do. Following the PRINT#2 sequence you should read the error channel once more to verify that the drive isn't in trouble again. If a problem does occur it will most likely be error 52, FILE TOO LARGE. This will occur if insufficient disk space is available to accomodate the file. The reason for calculating the estimated file size becomes a bit more obvious now, doesn't it? Finally you must close up your files, RELative file first followed by the error channel: CLOSE2:CLOSE1 The above sequence of operations that we have just performed only has to be used once. After the file has been created we can simply OPEN and read and/or write to it without having to specify record length. It is still necessary to position the drive however. We will give that aspect of using the file further consideration when we start to actually read and write data. We now have a RELative file on our disk with 500 empty records ready to accept data. Before we can start using these records let's set up a few useful subroutines and variables for our program to work with. IV. SUBROUTINES There are certain operations that will be repeatedly used within your program. For that reason it is wise to structure them as subroutines to facilitate the execution of your program. These operations are: positioning the drive, reading the error channel, OPENing files and CLOSEing files. Also, the read and write operations are usually more convenient if structured as subroutines. It is also convenient to use certain variable names to represent various pieces of information that will be floating around in your program. Here are the variables and their functions that I use: VAR. FUNCTION ======================================= I RECORD INDEX LOOP VARIABLE J RECORD NUMBER P FIELD POSITION IN RECORD X DOS ERROR NUMBER X$ DOS ERROR MESSAGE EF ERROR FLAG...1=ERROR 0=NO ERROR F MULTI-PURPOSE LOOP VARIABLE Q CONSTANT EQUAL TO 256 ======================================= As we further progress into more advanced file handling we will see where all of these variables will become useful. You of course can use any variable names that you like. In my RELative file programs I generally use certain line numbers in connection with the various subroutines required to manipulate the file. Here is a list of these line numbers: LINE FUNCTION ======================================= 6900 OPEN ERROR CHANNEL 7000 OPEN EXISTING RELATIVE FILE 7100 CLOSE ALL FILES 8100 POSITION DRIVE TO RECORD & FIELD 8200 READ ERROR CHANNEL 8300 WRITE TO RECORD 8400 READ FROM RECORD ======================================= You may find that other line numbers may be more convenient. In all of the discussions that follow from this point on the above variables and line numbers will be refered to. Here are the recommended syntaxes for these lines except lines 8300 through 8499 (we will continue to use the file numbers that we've used up til now): 6900 OPEN1,8,15:RETURN 7000 OPEN2,8,2,"0:MAILING LIST":GOSUB8200:RETURN 8100 CLOSE2:CLOSE1:RETURN 8100 RH=INT(J/Q):RL=J-RH*Q 8110 PRINT#1,"P"CHR$(98)CHR$(RL)CHR$(RH)CHR$(P) 8200 EF=0:INPUT#1,X,X$:IFX>19THENEF=1 8210 RETURN Observe that the routine for positioning the drive "falls through" to the routine that reads the error channel. This is simply a matter of programming convenience. Also notice the syntax in line 7000. Once the RELative file has been created your program does not need to specify the record length. The DOS will automatically figure it out by itself. You must always check the error channel after OPENing the file however, just in case it was accidentally SCRATCHed from the disk. In line 8100 we first calculate the record low byte/high byte values and then use them as variables to position the drive. The constant Q represents the value 256, a value that is frequently used in RELative file programs. A BASIC statement will always evaluate faster using a variable than it will using the ASCII representation of the number. In 8200 we first clear the error flag EF and then read the error channel. If the value of X is greater than 19 an error of some type has occured and the error flag EF is set to one. It is easier and faster to test EF with an expression such as IF EF THEN.... than to evaluate X every time an error check is needed. Well, digest this information and do a little experimenting. In RELATIVE FILES MADE RELATIVELY EASY part 2 we'll learn how to actually read and write the records and how do do field searches. W.J. Brier Feb. 1986