Beijing Paradise BBS Backup

home *** CD-ROM | disk | FTP | other *** search

/ Beijing Paradise BBS Backup / PARADISE.ISO / software / BBSDOORW / ALF20WB.ZIP / ALF.ZIP / QSORT.ZIP / QSORT.DOC next >

Wrap

Text File | 1986-10-25 | 22.5 KB | 424 lines

QSORT - Version 2.1 Copyright (c) 1985, 1986 - Ben Baker - All rights reserved QSORT may be freely copied and distributed for non-commercial purposes, provided that 1) it is distributed under the name "QSORT," and 2) this documentation file always accompanies it. Vendors wishing to distribute QSORT commercially, or with commercial products may contact the author at the address below for terms. QSORT is made available under the "share-ware" concept. If you find this program useful, you are asked to send its author a donation commensurate with its value to you ($20 would be nice). This will encourage the development of other useful, affordable tools. Send checks to: Ben Baker RR #1, Box 637 E. Alton, Il 62024 Bug reports or suggestions may be sent to the above address or via FidoNet mail to Fido node # 100/76, Baker's Acre, or by logging into Baker's Acre at 618-251-2169. Purpose: This filter command reads text data from the standard input device, sorts it and writes it to the standard output device. Optionally, it may be used to sort a file "in place." Format: QSORT[L] [filespec] [/R] [[/Flen] | [/Mlen]] [/T[tag]] | [/[L][+|-][col][:width]. . . Type: Internal External **** Remarks: Sorts are done using the ASCII collating sequence, or optionally using lexical sequence. Tab characters are not expanded with blanks. Files containing fixed-length records may be sorted, but | key fields must be ASCII strings. Fixed length records | need not be terminated by a CR/LF character pair, but the | lines in all other files must end with CR/LF. | Up to 30 key field parameters, /[L][+|-][col][:width] may be used to specify key fields and are ordered major to minor from left to right. The 'L', if present, specifies "lexical" sequence for this field. (See discussion of lexical sequence below.) The minus (-) sign specifies reverse order for this field. The plus (+) sign (or no sign) specifies normal sort order. If present, [col] defines the beginning column of the field. If omitted, column 1 is assumed. If present, [:width] defines the field width in columns. If width is omitted, the rest of the record is assumed. If no key field parameters are given, the entire record is the key field. Any key field which falls beyond the end of a particular record is treated as a null (zero length) field for that record, and collates low. The /Mlen parameter allows optionally specifying the maximum record length. It may lie in the range of /M132 to /M65535. If this parameter is present, it must appear before any field parameters. If it is omitted, 132 is the default. NOTE: Maximum record length may be larger than any record in the file to be sorted, but excessive maximum record length will unecessarily degrade the performance of QSORT. The /Flen parameter defines the record length for a file of | fixed-length records. All records in the input file MUST | be exactly <len> bytes long. The records need not (but | may) be terminated with a CR/LF sequence. They may contain | any data, even binary date, but the key fields must be | ASCII strings. Strings may be terminated with a null | (binary zero) character, or may be padded with trailing | spaces to the full length of the field. Note that QSORT | does not attempt to support Pascal style strings. These | are strings which begin with a character whose binary value | is a character count. This is followed by <count> | characters of ASCII data, which in turn is followed by | random data out to the maximum length of the string. These | strings may be used as key fields, but the programmer must | insure that either the last real character is a null | character, or the field is padded to its full length with | spaces. QSORT must be told that the key begins in the | second character position (the first character of real | data). | | The /Mlen and /Flen parameters are mutually exclusive. If | either is specified, it must appear before the first key | field parameter. It is an error to specify both. | The /R parameter is included for compatibility with DOS SORT and is redundant. It reverses the sense of sort direction for all key fields. If filespec is included in the command, any redirection specified will be ignored. Disk and path specifiers may be included in filespec and will be used for both input and output. QSORT will sort the input file defined by filespec to a file with an extent of .$$$. On successful completion of the sort, the input file is deleted and the output file is renamed to filespec, accomplishing, in effect, an "in-place" sort. The /T[tag] parameter, if present, indicates that the "records" to be sorted may be more than a single line long. If "tag" is also present, it defines a character to be used to tag the "end-of-record." If "tag" is not present, the first empty line terminates the record. For this purpose, "empty" means "no characters." A line containing but a single space is NOT empty! A line may be "tagged" by placing the tag character anywhere on the last line of a logical record. The entire line, including the tag character will appear as the last line of the record. The /Mlen or /Flen parameter and key field parameters must | observe the relative ordering defined above, but the other parameters may appear anywhere and in any order on the command line. Examples: Produce a directory listing sorted by creation date and time, and display it on the console a screen's worth at a time: A>DIR | QSORT /30:2 /24:5 /39 /34:5 | MORE The output of the DIR command is piped to QSORT. The key fields defined are, from left to right (major to minor), year (2 digits), month and day, AM/PM flag and time. The output of QSORT is then piped to MORE for display. Next, replace the unsorted FILE.TXT with the same data sorted in reverse order. Use columns 10 to 16 as the sort key: A>QSORT FILE.TXT /-10:7 or A>QSORT FILE.TXT /10:7 /R or (SORT compatible) A>QSORT FILE.TXT /R /+10 Next, perform a simple sort on a file with 240 byte records A>QSORT LARGE.REC /M240 GLOSS.TXT is an unsorted glossary of terms. The term being defined by each entry appears first, followed by several lines of definition, not exceeding 1000 characters. The entries are separated by empty lines. Produce GLOSS.SRT, a sorted version of the glossary: A>QSORT /T /M1000 <GLOSS.TXT >GLOSS.SRT A lawyer keeps a running log of his billable activities in TIME.LOG. The first line of each entry is "mm/dd/yy hh:mm account#." He always places a tilde (~) in the last line of each entry, which may be as many as 2500 characters. He wishes to sort the log by account number, and by ascending date and time within each account: QSORT /M2500 /16:7 /7:2 /1:5 /10:5 /T~ TIME.LOG | | The directory of users for a bulletin board system is kept | in a file binary file of fixed-length records 180 bytes | long. The user name is a 26-character field beginning in | the first position and the city/state field is a | 16-character field beginning in the fortieth position. | Sort the file by city/state and name. | | QSORT /F180 /40:16 /1:26 USER.BBS | --- Implementation Notes --- QSORT is intended as an enhanced replacement for DOS SORT. It is nearly fully upward compatible, but provides much more flexibility. Multiple sort keys may be specified, a pseudo in-place sort may be performed and files and/or records of any size may be sorted provided only that there is sufficient disk space for work files and the output file. QSORT uses the "quick sort" algorithm, which cannot guarantee the order of records whose keys are all equal. This is the one "incompatibility" with DOS SORT, which retains the original order of records when its only key compares equal. This is important to SORT because it must be invoked multiple times to effect a multiple key sort. With QSORT, you only sort once and there are usually enough keys available to insure you get the order you want the first time. QSORT uses a sort buffer of about 60K bytes and will fill the buffer as full as possible, and then sorts the contents of the buffer. If it has reached the end of the input file and has not yet generated any work files, the contents of the buffer are written to the output file, completing the sort operation. If the input file is too large to fit into the sort buffer, as much of the input file as possible is read into the buffer, sorted, then written to a temporary work file. This process is repeated as many times as necessary to process the entire input file, each time creating a new work file for the sorted output. Upon completion of the "sort phase," QSORT begins a "merge phase." Each work file is a sorted sub-set of the input file. Thus, work files may be read sequentially and combined to produce a sorted output. QSORT will open as many work files as DOS permits (more on this later). If all the remaining work files can be opened, the sorted result is written to the output file. Otherwise, a new work file is created and another merge pass will be required. On each merge pass, the number of work files is reduced and eventually all remaining work files will be opened and the sorted output file will be written completing the sort operation. QSORT is smart enough to never have just one work file remaining, which would require an unnecessary copy operation. QSORT, to work properly, needs enough space on the output disk to hold the output file. Even if the input file is to be deleted and resides in the same directory, that is not done until after the output file has been successfully written. If one merge pass is required, the default directory should have about 10% more space than the size of the input file for work files. If more than one merge pass will be required, allow about twice the size of the input file as work file space on the default disk. If QSORT has an error, it will terminate with the input file still intact, and will return to DOS with ERRORLEVEL = 1. Each time QSORT must create a new work file, the data put into it must be processed again. Obviously, the more files QSORT can open during the merge phase, the faster it can sort large files. If DOS is properly pre-conditioned, QSORT can have up to 15 work files open at once, and very large files can be sorted with just one sort pass and one merge pass. Unfortunately, that capability is not automatic. DOS has a fixed number of file "handles" that it associates with open files. The default number is eight, but DOS opens five of them for standard input, standard output, standard error, standard printer and standard auxiliary device. That leaves three for merging. A 250K input file would produce five work files and that would take three merge passes; merge two into one, leaving four; merge two into one leaving three; and finally merge three into the output file. Worse yet, since you need at least three handles for merging, if you have resident programs that have open files you can't merge at all! DOS can be told to set aside more space for file handles. Each handle is only 39 bytes and it's memory very well spent. One process can have a maximum of 20 handles open at one time, but since resident processes may be using handles, I recommend 25 to 35. To do this, the root directory of the DISK OR DISKS YOU BOOT FROM must contain a file named CONFIG.SYS. If your boot disk(s) already contains a CONFIG.SYS, edit it, or if not, create it to contain the following line: FILES=25 (or more) While we're at it, let's add one more thing to CONFIG.SYS which will improve the performance of QSORT and many other programs as well. DOS provides, by default, two disk buffers. These are the buffers it uses to do its disk reads and writes. During the merge phase QSORT may have many files open at once, reading from them in more or less random order. DOS may have to read the same physical sector several times to get all its data. But DOS can remember what's in each buffer and where it came from, and will not re-read a sector it already has in a buffer. DOS needs 528 bytes for each buffer. I recommend 20 buffers to make QSORT perform well under the most adverse conditions. This will require an additional 9504 bytes or slightly more than 9K, again memory well spent, so we add to CONFIG.SYS the following line: BUFFERS=20 I hope you find this program useful. Your comments and suggestions are welcome. My address is at the top of this file. Notes on Version 1.1: ----- -- ------- ---- QSORT 1.0 set an arbritrary maximum record length of 132. This should certainly be enough, right? Wrong! The program had not been released a month before I heard from a user who needed to sort 240-byte records. It occurred to me that, for any reasonable maximum length, someone would need to sort bigger records. The sort and merge subroutines need to know the maximum size record to be expected in order to manage their buffers, and an unnecessarily large limit would make them wasteful of space and degrade performance. The solution, of course, was to add the '/M' parameter, allowing a user to specify his own limit if 132 is not enough. One other minor change was made to the program. Version 1.0 documentation implied (in fact stated!) that column number could be omitted from a field parameter and would default to 1. In other words '/1:4' and '/:4' were equivalent. The program, however, would not accept the latter. Since this is a useful shorthand, the program has been changed to match the documentation. Notes on Version 1.2: ----- -- ------- ---- This version was borne out of my own need to sort files with mixed capitalization. ASCII sequence produced some bizarre results when words beginning with 'Z' sorted before those beginning with 'a.' Case-insensitive sorting isn't much better because upper and lower case get mixed randomly. The following table will illustrate what I mean: INPUT ASCII CASE LEXICAL INSENSITIVE DeLaPort Baker Baker Baker Smith Brown brown Brown brown DeAngelo bRown bRown deLaPorte DeLaPort Brown brown Deangelo Deangelo Deangelo DeAngelo deAngelo Deangelo deangelo Deangelo Brown DelaPort Deangelo Deangelo smith DelaPorte deAngelo deAngelo delaPorte Harry DeAngelo deangelo DelaPort Smith delaPort DeLaPort DeAngelo bRown DelaPort DelaPort DelaPorte brown delaPort delaPort deangelo deAngelo DeLaPort delaPort Harry deLaPorte DelaPorte DelaPorte delaPort deLaPorte deLaPorte deLaPorte Baker deangelo delaPorte deLaPorte deLaPorte delaPort deLaPorte delaPorte Deangelo delaPort Harry Harry bRown delaPorte smith Smith delaPort smith Smith smith The first column is a list of names in arbitrary order. The second is an ASCII sort of that list. Third we have one possible case-insensitive sort of the list. The fourth column is what I really wanted. It is sorted the way these words would be sorted in a dictionary (or lexicon). The third and fourth columns both collect words of identical spellings together, but in the third column, upper and lower case spellings are in arbitrary order, while the fourth column places upper case spellings ahead of lower case spellings. For example, the two occurrences of Smith are widely separated in column 2 because one is capitalized and the other is not. Column 3 brings the two together, but in the wrong order. They might have been in the right order, but the order is strictly arbitrary. In column 4, Smith comes before smith, and lexical sorting will always put them in this order. The lexical sort implemented in this version is achieved by making case-insensitive comparisions of entire fields. Only when a comparison of the entire field is equal is an ASCII comparison called upon to arbitrate ties. Lexical sorting can be very useful when needed, but be aware that unnecessarily specifying lexical ordering may degrade performance of QSORT. Speaking of performance, QSORT does pretty well. Matt Kanter of New York reports sorting a 650K file in 40 minutes and that ain't too shabby! Nevertheless, QSORT has been made more intelligent in the way it schedules merging of very very large files, thus improving performance when multiple merge passes are required, a feature that will go unnoticed by most users (sigh). Notes on version 2.0 ----- -- ------- --- First, you will notice there are two programs, QSORT.EXE and QSORTL.EXE in the distribution archive. The source for these programs is identical and all the above instructions apply to both. The only difference is that QSORTL is compiled for the "large data model." This allows it to use all of available memory as a sort buffer, but requires it to use a slower 32-bit addressing scheme to access data. For most normal sorting chores, QSORT will run significantly faster than QSORTL, but for very large files, or when a very large maximum record length is specified, QSORTL will outperform QSORT. Version 2.0 also adds a significant new capability to QSORT, the ability to sort logical records which span many lines of text. Two such files are illustrated in the examples above, glossaries and time accounting logs. Many other kinds of files fit this catagory. This feature, triggered by the /T parameter, usually requires the /M parameter to tell QSORT(L) the largest logical record it can expect. Finally, QSORT has been converted from Lattice C version 2.14 to version 3.0 of that compiler, resulting in a leaner, faster program. As a test, I performed identical simple sorts of various file sizes on my 640K XT (with a badly fragmented 10-meg disk) using QSORT-2.0, QSORTL-2.0 and QSORT-1.2. Here are actual test results: File Size QSORT-2.0 QSORTL-2.0 QSORT-1.2 52k 0:51.35 1:10.19 1:36.56 148k 3:41.90 3:23.39 4:49.62 296k 7:17.04 7:30.44 9:47.65 I did not have sufficient disk space to test the programs on larger files, but suspect QSORTL will have a step-function increase in running time when file size exceeds available buffer space (about 450K in my case) making it slower that QSORT from that point to about 1500K. Above that point QSORT will have to make more than one merge pass and should slow down rapidly. I would appreciate hearing your test results with very large files. Notes on Version 2.1 With this version, QSORT.COM is replaced by QSORT.EXE. If you have an earlier QSORT.COM, delete it. DOS looks first for a ".COM'" file and then a ".EXE". The most noticable change is the addition of support for fixed-length records. However, some changes were made to improve performance. For instance, instead of a single, general purpose compare function, QSORT now has several special purpose compare routines, and it selects the most efficient based on the combination of parameters you use on the command line. Because of this change, QSORTL is quite competetive with QSORT on small files, and is decidedly better on large files. I suggest you compare the two on your most typical sort jobs and use the one which performs best in that environment for all sorting. I am still not wholly satisfied with performance, and the next version will probably have compare routines written in assembly language.