home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
- Maximizing Performance with dBase II and III
-
- by Alan Simpson
-
- Reprinted from "Personal Systems",
- The Journal of the San Diego Computer Society
- Issues of November and December 1984 and January 1985
- Typed and Formatted by Rick Yount, Oak Harbor, WA
- (206) 675-9797
-
- This article may be reprinted for any non-commercial purpose;
- please credit the Author and "Personal Systems".
-
- TABLE OF CONTENTS:
-
- Trimming Minutes Down to Seconds ........... 1
- Don't Resort to Re-Sorting ................. 2
- Faster Sorts ............................... 4
- Faster Searching ........................... 4
- Faster Math ................................ 5
- Faster Reports ............................. 6
- Faster Copying ............................. 6
- Faster Edits ............................... 7
- Searching for Ranges ....................... 8
- Estimating Performance ..................... 8
- Managing Multiple Index Files .............. 9
- Trade-Offs ................................ 10
- Technical Rationale ....................... 12
-
-
- Everyone wants the most out of their computer. Granted, it's
- nice that the computer can trim down to minutes what usually
- requires humans hours to perform. But then, nobody complains if
- those computer minutes can be trimmed down to seconds. In this
- paper we'll discuss and compare general techniques for maximizing
- the performance of dBase. We'll trim some of those long dBase
- processing minutes down to quick dBase seconds.
-
-
- TRIMMING MINUTES DOWN TO SECONDS
-
- We'll begin by benchmarking (comparing) a few techniques for
- performing some basic tasks with dBase II. We'll use a simple
- mailing list database as an example. Its name is MAIL.DBF, and it
- has this structure:
-
- FLD NAME TYPE WIDTH DEC
-
- 001 LNAME C 020 000
- 002 FNAME C 020 000
- 003 ADDRESS C 020 000
- 004 CITY C 020 000
- 005 STATE C 010 000
- 006 ZIP C 010 000
- 007 AMOUNT N 009 002
- First, let's discuss various methods for sorting this database:
-
- ----------------------------------------------------------------
- Maximizing dBase II and III Performance - by Alan Simpson
- Page 1
-
-
-
-
-
-
-
- DON'T RESORT TO RE-SORTING
-
- There are basically four methods to maintain a sorted list of
- items in a database. The first is to simply CREATE the MAIL
- database, add records to it, then sort it with the SORT command.
- For example, suppose you CREATE MAIL and APPEND 999 records to it.
- To add another record to it, and maintain an alphabetical listing
- by last name, you would need to use the commands:
-
- USE MAIL
- APPEND
- SORT ON LNAME TO TEMP
-
- This procedure would require about 940 seconds to re-sort a
- database with 1000 records in it.
-
- Another method would be to USE MAIL, APPEND the new record,
- then INDEX the file on the LNAME field, using the commands below:
-
- USE MAIL
- APPEND
- INDEX ON LNAME TO NAMES
-
- This procedure requires about 530 seconds to re-sort the
- 1000-record database back into last name order
-
- A third method would be to locate the position in the
- database that the new record belongs, and INSERT a new record into
- its proper alphabetical place, as below:
-
- USE MAIL
- LIST (to find the insertion point)
- INSERT
-
- The time required would be however long it takes you to find
- the appropriate place to insert the new record, plus about 124
- seconds for the INSERT command to rearrange the database.
-
- The fourth method is to create an index file of the field to
- sort on, and keep that index file active while adding the new
- record(s). In most cases, the best time to index a file is
- immediately after creating it, as in the commands below:
-
- CREATE MAIL
- USE MAIL
- INDEX ON LNAME TO NAMES
-
- It only takes about two seconds to create the index file when
- the database is empty. At this point, the MAIL.DBF database and
- the LNAME.NDX files exist on disk, but both are empty. To add new
- data, you would need to make the NAMES.NDX file active by typing
- in the commands:
-
-
-
-
- ----------------------------------------------------------------
- Maximizing dBase II and III Performance - by Alan Simpson
- Page 2
-
-
-
-
-
-
- USE MAIL INDEX NAMES
-
- If you always USE MAIL INDEX NAMES when you add new data, the
- index file will automatically be updated, therefore the data will
- always be sorted. So to add one record to the MAIL database with
- 999 records in it, you would type the commands:
-
- USE MAIL INDEX NAMES
- APPEND
-
- There is no need to go through the INDEX ON procedure again,
- because the NAMES.NDX file was active during the appending
- procedure. The time required to automatically re-sort the index
- file using this method is a scant 3 seconds. Add to that the
- original two seconds for the INDEX ON command to create the
- initial index file for a total of 5 seconds. Table 1 compares the
- processing times for the four methods (using a 16-bit computer
- with 256K RAM, floppies, and dBase II Version 2.4):
-
- -------------------------------------------------------
- Method Commands Used Time Required
-
- 1 USE, APPEND and SORT 940 seconds
- 2 USE, APPEND and INDEX ON 530 seconds
- 3 USE, INSERT 129 seconds
- 4 USE file1 INDEX file2 APPEND 5 seconds
- -------------------------------------------------------
- Table 1: Sorting Times; Four Different Approaches
-
- Remember, you need to first create an index file based upon
- the field(s) you wish the database to be sorted by. Use the INDEX
- ON command to create the index (.NDX) file. For example, to
- maintain an alphabetical listing of people on the MAIL database by
- last name, type in the commands:
-
- USE MAIL
- INDEX ON LNAME TO NAMES
-
- This creates and stores an index file called NAMES.NDX on
- disk. To make the index file active, specify its name in the USE
- command, as below:
-
- USE MAIL INDEX NAMES
-
- Once the file is made active in this way, any changes to the
- database, whether they be through APPEND, EDIT, BROWSE, REPLACE,
- PACK or READ will automatically re-sort and adjust the index file
- accordingly.
-
- Avoiding the re-sorting process on only one way in which
- index files can greatly improve the speed of a dBase II software
- system. Most types of processing that involve searching can also
- be accelerated.
-
-
-
-
- ----------------------------------------------------------------
- Maximizing dBase II and III Performance - by Alan Simpson
- Page 3
-
-
-
-
-
-
- FASTER SORTS
-
- In some cases, a database should be physically sorted rather
- than indexed. For example, the UPDATE command works best if the
- file you are updating FROM is physically pre-sorted. You could
- use the SORT command to create a sorted database called TEMP from
- the MAIL database using the commands:
-
- USE MAIL
- SORT ON LNAME TO TEMP
-
- This approach takes about 940 seconds, or about 15 minutes of
- processing time. If the NAMES index file already exists, you can
- achieve the same result using the commands:
-
- USE MAIL INDEX NAMES
- COPY TO TEMP
-
- Copying the contents of the indexed file only required about
- 326 seconds of processing time; about one-third the time. When
- you COPY an indexed file to another database, the records on the
- database you copy to will be physically sorted.
-
-
- FASTER SEARCHING
-
- Let's assume that the MAIL.DBF database already has 1,000
- records on it. Ten individuals on this database have the last
- name (LNAME field) "Miller". The question is: Just how long
- does it take to LIST, COUNT, or COPY all the "Miller"'s to another
- database; or how long does it take to print a formatted REPORT
- with only "Miller"'s, or to SUM the amounts for the "Miller"'s?
- The answer, of course, is: It depends on how you do it.
-
- For our benchmark comparisons, we'll assume that the database
- has already been indexed on the LNAME field, using the commands:
-
- USE MAIL
- INDEX ON LNAME TO NAMES
-
- Let's begin by comparing processing times using two different
- command files and approaches. The first method will use the
- standard LIST FOR approach to fish out all the "Miller"'s. The
- command file looks like this:
-
- ********** Method 1: LIST FOR approach
-
- ERASE
- USE MAIL INDEX NAMES
- ACCEPT " List all with what last name? " to SEARCH
- LIST FOR LNAME=SEARCH
-
- When you DO this command file, it clears the screen and
- displays the prompt:
-
- List all with what last name?
-
- ----------------------------------------------------------------
- Maximizing dBase II and III Performance - by Alan Simpson
- Page 4
-
-
-
-
-
-
-
- Suppose you type in the name "Miller" and press the RETURN
- key. The program will then display the records of all ten
- "Miller"'s on the screen. The time required for the command file
- to display the "Miller"'s and return to the dot prompt is about
- 148 seconds on a floppy disk system. Almost two and a half
- minutes.
-
- A second approach to solve this problem is to use the FIND
- command to look up the first "Miller" in the NAMES index, then use
- the WHILE option to display the remaining "Miller"'s in the
- database, as in the program below:
-
- ********** Method 2: FIND and LIST WHILE approach
-
- ERASE
- ACCEPT " List all with what last name? " to SEARCH
- FIND &SEARCH
- LIST WHILE LNAME = SEARCH
-
- Processing time for the second method to display all 10
- "Miller"'s then redisplay the dot prompt on the screen, is less
- than 9 seconds. Table 2 compares processing times for the two
- different methods. Both methods perform exactly the same task,
- but the processing times vary dramatically.
-
- -------------------------------------------------------
- Method Commands Used Time Required
-
- 1 LIST FOR 148.75 seconds
- 2 FIND and LIST WHILE 8.94 seconds
- --------------------------------------------------------
- Table 2: Comparison of Processing Times; Two Methods
-
-
- FASTER MATH
-
- The FIND and WHILE approach with indexed databases can offer
- significant time savings with dBase commands other than LIST. For
- example, if you want dBase to COUNT how many "Miller"'s are in the
- database, you can use the commands:
-
- USE MAIL
- COUNT FOR LNAME = "Miller"
-
- This approach requires about 55.5 seconds to display the fact
- that there are 10 "Miller"'s in the database, then redisplay the
- dot prompt. You can cut this time down significantly using the
- commands:
-
- USE MAIL INDEX NAMES
- FIND Miller
- COUNT WHILE LNAME = "Miller"
-
- This approach performs the same task in about 3.20 seconds;
- quite a significant time savings.
-
- ----------------------------------------------------------------
- Maximizing dBase II and III Performance - by Alan Simpson
- Page 5
-
-
-
-
-
-
-
- Suppose you wish to SUM the AMOUNT field for just the
- "Miller"'s? You could use these commands:
-
- USE MAIL
- SUM AMOUNT FOR LNAME = "Miller"
-
- Processing time using this method is about 58 seconds. The
- alternative approach uses these commands:
-
- USE MAIL INDEX NAMES
- FIND "Miller"
- SUM AMOUNT WHILE LNAME = "Miller"
-
- This approach takes 5.28 seconds; less than one-tenth the
- time.
-
-
- FASTER REPORTS
-
- You can use the WHILE command with the REPORT command, also.
- For example, suppose you've already created a formatted report
- called MAILIST using the REPORT command. To display all the
- "Miller"'s on the formatted report, you could use the commands:
-
- USE MAIL
- REPORT FORM MAILIST FOR LNAME = "Miller"
-
- This approach requires about 135.3 seconds to display all the
- "Miller"'s on the report, then redisplay the dot prompt. The
- faster approach uses these commands:
-
- USE MAIL INDEX NAMES
- FIND Miller
- REPORT FORM MAILIST WHILE LNAME = "Miller"
-
- These commands perform the same job in a slim 14.03 seconds.
-
-
- FASTER COPYING
-
- For copying portions of the MAIL database to a database
- called TEMP, the commands:
-
- USE MAIL INDEX NAMES
- COPY TO TEMP FOR LNAME = "Miller"
-
- require a hefty 200.6 seconds; more than three minutes. You can
- perform the same job using these commands:
-
- USE MAIL INDEX NAMES
- FIND Miller
- COPY TO TEMP WHILE LNAME = "Miller"
-
- These commands trim the copying time down to a comfortable
- 18.7 seconds.
-
- ----------------------------------------------------------------
- Maximizing dBase II and III Performance - by Alan Simpson
- Page 6
-
-
-
-
-
-
- FASTER EDITS
-
- Indexed files can also speed up the editing process. For
- example, suppose you want to BROWSE through the database to edit
- data for one of the "Miller"'s. One approach would be to type in
- the commands:
-
- USE MAIL
- BROWSE
-
- These commands will set up the dBase BROWSE screen with the
- names and addresses in their original order, as displayed in
- Figure 1:
-
-
- LNAME--------------------FNAME------------------ADDRESS
- Bond James 007 Spy St.
- Kenney Dave 123 Clark St.
- Newell Jeff 341 Lou Drive
- Mohr Richard 350 W. Leadora St.
- Rosiello Rick 999 Buddy Way
- Wallace Doug 345 Killer St.
- Miller Mike 601 Lemon Dr.
-
- Figure 1: An unindexed BROWSE screen.
-
- You will need to use lots of CTRL-C commands to scroll
- through the database to find the "Miller" you're looking for.
- There's no telling how long it might take you to find the
- particular "Miller" you wish to edit, because the "Miller"'s will
- be placed randomly throughout the database.
-
- If, on the other hand, you use these commands to BROWSE:
-
- USE MAIL INDEX NAMES
- FIND Miller
- BROWSE
-
- The BROWSE screen will display the first "Miller" on the
- database, and all the remaining "Miller"'s immediately beneath, as
- shown in Figure 2:
-
-
- LNAME--------------------FNAME------------------ADDRESS
- Miller Mollie 601 Mission Blvd.
- Miller Shiela 1234 Genessee
- Miller Ms. Stephanie S. 734 Rainbow Dr.
- Miller Patti 626 Mazda Way
- Miller George P.O. Box 2802
- Miller Julie 999 Love St.
- Miller Caron 123 Princess Way
- Miller Ms. Chrissie 321 Hynde St.
- Miller Mrs. Sally S. 325 Seco Ct.
- Miller Dr. James T. 701 Newport Dr.
-
- Figure 2: A BROWSE screen with an indexed database
-
- ----------------------------------------------------------------
- Maximizing dBase II and III Performance - by Alan Simpson
- Page 7
-
-
-
-
-
-
- No need to go scrolling through pages and pages of BROWSE
- screens to find the "Miller" you wish to edit, because all ten
- "Miller"'s are displayed immediately and simultaneously on one
- BROWSE screen.
-
- Similarly, if you wish to use the EDIT command to change the
- data for a particular "Miller", you can use the commands:
-
- USE MAIL INDEX NAMES
- FIND Miller
- EDIT WHILE LNAME = "Miller"
-
- These commands will quickly display the first "Miller" in the
- database on the EDIT screen. Each subsequent CTRL-C command will
- immediately position you to the next "Miller" in the database. So
- once again, you can save a great deal of time by not having to
- scroll around and search for individual "Miller"'s.
-
-
- SEARCHING FOR RANGES
-
- The FIND and WHILE approach can also be used for searching
- for ranges of data. For example, if a database had a DATE field
- with dates stored in MM/DD/YY format, and the database was indexed
- on the date field, the following command file would list all
- records between the requested starting and ending dates:
-
- USE file INDEX datefield
- STORE " " TO START, FINISH
- @ 2,2 SAY "Enter Start Date " GET START PICTURE "99/99/99"
- @ 4,2 SAY "Enter End Date " GET FINISH PICTURE "99/99/99"
- READ
-
- FIND &START
- LIST WHILE DATE <= FINISH
- (This command file assumes that all dates have the same
- year. It's a little trickier when the data is spread across
- several years
-
-
- ESTIMATING PERFORMANCE
-
- Performance can be an important issue in developing a custom
- dBase II software system. Generally, processing times tend to
- increase linearly with the size of a database. Therefore, on a
- database with 5,000 records in it, displaying all the "Miller"'s
- could take as long as 740 seconds, a little over 12 minutes, with
- the LIST FOR approach. Using an indexed file with the FIND and
- LIST WHILE approach will perform the same task in about 45
- seconds, less than a minute. Double those times for a database
- with 10,000 record in it: A whopping 24 minutes (almost a half
- hour) with the FOR approach, vs. 88 seconds (under a minute and a
- half) with the FIND and WHILE approach.
-
-
-
-
- ----------------------------------------------------------------
- Maximizing dBase II and III Performance - by Alan Simpson
- Page 8
-
-
-
-
-
-
-
- Keep in mind that any dBase command that allows the FOR
- option will allow the WHILE option instead (e.g. LIST, DISPLAY,
- COPY, REPORT, SUM, COUNT, REPLACE, DELETE, TOTAL). Therefore, any
- of these processes can be greatly expedited with an index file and
- the FIND and WHILE commands.
-
-
- MANAGING MULTIPLE INDEX FILES
-
- In the previous examples, we compared processing times using an
- index file called NAMES. This index file contains only the LNAME
- field. Realistically, a mailing list will probably require two
- separate sort orders; 1) a sort by last name and first name for
- printing a directory listing, and 2) a sort by zip code for bulk
- mailings. In that case, you need to create two index files to
- manage the two different sort orders. One index file, which we'll
- call NAMES, will keep the mailing list data in last name and first
- name order. A second index file, which we'll call ZIPS, will
- maintain a zip code order for handling bulk mailings. To create
- both of these index files, you'll need to first CREATE the
- MAIL.DBF database with the CREATE command, then immediately create
- the two index files using the commands:
-
- USE MAIL
- INDEX ON LNAME + FNAME TO NAMES
- INDEX ON ZIP TO ZIPS
-
- The MAIL.DBF database now has two index files associated with
- it; NAMES.NDX and ZIPS.NDX. You can keep both index files active
- by using the command:
-
- USE MAIL INDEX NAMES,ZIPS
-
- By specifying two index files in this fashion, all future
- modifications to the database with the APPEND, EDIT, BROWSE, READ,
- or REPLACE commands will * * AUTOMATICALLY * * update both
- index files.
-
- If you LIST or DISPLAY ALL the records in the database, they
- will be displayed in last name order, since NAMES is the first-
- listed index file in the INDEX portion of the command line.
-
- Furthermore, you can only use the FIND command with the first
- listed index, NAMES. Therefore, to display all the records in the
- database in zip code order, you need not go through the INDEX ON
- ZIP TO ZIPS procedure again. Instead, you can simply type in the
- command:
-
- SET INDEX TO ZIPS
-
- This eliminates the time required to sort the file again.
- Before adding new records or editing the database, be sure to use
- the commands:
-
-
-
- ----------------------------------------------------------------
- Maximizing dBase II and III Performance - by Alan Simpson
- Page 9
-
-
-
-
-
-
- USE MAIL INDEX NAMES,ZIPS
- or
- USE MAIL INDEX ZIPS,NAMES
-
- This is required to make both index files active again.
- Otherwise, you'll be likely to get a RECORD OUT OF RANGE error
- sometime in the not too distant future.
-
-
- TRADE-OFFS
-
- There are some trade-offs to contend with when using multiple
- index files. Generally, the more index files you have active at
- any given moment, the longer it takes to perform an APPEND, EDIT
- or REPLACE procedure. For example, if you want to add records to
- the MAIL database without any active index files, you can just use
- the commands:
-
- USE MAIL
- APPEND
-
- As you type in each new individual's data, the screen will
- immediately accept each new record and re-display the next APPEND
- screen. However, if you decide to get carried away and create
- four index files, and keep them all active as:
-
- USE MAIL INDEX NAMES, ZIPS, CITIES, STATE
-
- You will notice a definite delay between the time you fill one
- APPEND screen and the appearance of the next blank APPEND screen.
- On a large data file with over 1,000 records in it, the delay
- could be as much as 20 seconds, depending on how many fields are
- in each index file and the RAM capacity of your computer.
-
- In general, one or two active index files are sufficient for
- most databases. The delays caused by one or two active index
- files are relatively insignificant, and are more than compensated
- for by the time savings that the FIND and WHILE commands offer, as
- well as by the time savings gained by avoiding re-indexing.
-
- Another disadvantage to indexed files occurs during global
- replaces. For example, if for some reason you wished to set all
- the AMOUNT fields back to zero in your hypothetical MAIL database,
- you could use the commands:
-
- USE MAIL INDEX NAMES,ZIPS
- REPLACE ALL AMOUNT WITH 0
-
- On a database with 1000 records in it, this process could
- easily take 45 minutes. However, the commands above waste
- processing time by updating the index files when it is not
- necessary to do so.
-
-
-
-
-
- ----------------------------------------------------------------
- Maximizing dBase II and III Performance - by Alan Simpson
- Page 10
-
-
-
-
-
-
- Recall that the NAMES index contains the LNAME and FNAME
- fields, and the ZIPS index contains the ZIP field. The AMOUNT
- field is not used in either index file. Therefore you can use the
- NOUPDATE option (supplied in dBase Versions 2.4 and higher) to
- perform the replace. These commands:
-
- USE MAIL INDEX NAMES,ZIPS
- REPLACE NOUPDATE ALL AMOUNT WITH 0
-
- perform the update in about six minutes. The NOUPDATE option
- informs dBase that, even though there are two active index files,
- there is no need to update them while performing this REPLACE
- command.
-
- There are some important points to keep in mind about the
- INDEX and FIND commands. First, the FIND command only works on an
- indexed field. If a database is in use with multiple index files,
- the FIND command only works with the first listed index file. For
- example, if you open the MAIL data base with the two index files
- as below:
-
- USE MAIL INDEX ZIPS, NAMES
-
- the FIND command can only be used to locate a zip code.
-
- If the data to look up in a database is stored in a variable,
- then the variable name must be "macro-ized" to be used with the
- FIND command, as below:
-
- ACCEPT "Look up whom? " to SEARCH
- FIND &SEARCH
-
- Also, FIND does not support functions or operators. That is,
- you cannot use the commands: FIND Miller .OR. Smith or FIND
- !(&SEARCH) nor FIND LNAME > &SEARCH.
-
- If you create two index files, but later add, edit, or delete
- data with only one or neither of the index files active, the index
- files will be corrupted, and you will most likely get a RECORD OUT
- OF RANGE error at a later time. In this case, both index files
- must be re-created by typing in the commands:
-
- USE MAIL
- INDEX ON LNAME + FNAME TO NAMES
- INDEX ON ZIP TO ZIPS
-
- or . . .
-
- USE MAIL INDEX NAMES,ZIPS
- REINDEX
-
- Again, you can avoid the re-indexing by always keeping both
- index files active with the command USE MAIL INDEX NAMES, ZIPS
- when working with the database.
-
-
-
- ----------------------------------------------------------------
- Maximizing dBase II and III Performance - by Alan Simpson
- Page 11
-
-
-
-
-
-
- TECHNICAL RATIONALE
-
- From a technical standpoint, the reason that the FIND and
- WHILE approach always dramatically outperforms the FOR approach is
- quite simple. Whenever you use the FOR option to perform a
- search, dBase always starts accessing the records from record
- number 1 and reads every single record directly from disk.
- Therefore, if you had a database with 10,000 names in it, ten of
- which were "Miller", dBase would perform 10,000 disk accesses to
- display the 10 "Miller"'s. Ten thousand disk accesses takes a
- very, very long time.
-
- Many unnecessary disk accesses can be avoided by the fact
- that dBase always stores an active index file in RAM (at least, as
- much of it as will fit in RAM). Furthermore, the index file is
- always in sorted order in RAM.
-
- When you use the FIND command with an index file, dBase finds
- the first "Miller" in the index file in RAM, which requires no
- disk accesses. Furthermore, the WHILE option only searches WHILE
- (as long as) the search condition is true. Therefore, dBase will
- stop searching as soon as it encounters the first non-"Miller" in
- the index file. Since the index is already in sorted order,
- "Miller"'s are all clumped together and disk accesses will stop as
- soon as all the "Miller"'s have been displayed.
-
- So the FOR command requires 10,000 disk accesses to display
- the ten "Miller"'s, while the FIND and WHILE approach only
- requires 10 disk accesses; thereby eliminating 9,990 unnecessary
- disk accesses and about 20 or 30 minutes of time depending on the
- size of the database, the amount of RAM your computer has, and the
- speed of your disk drives.
-
- - EOF -
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------
- Maximizing dBase II and III Performance - by Alan Simpson
- Page 12
-
-
- √