home *** CD-ROM | disk | FTP | other *** search
Text File | 1988-01-23 | 51.3 KB | 1,318 lines |
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- QSORT -- Version 3.12
-
- Text File Sorting Utility
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright 1985, 86, 87, 88 - Ben Baker
-
- All rights reserved
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Table of Contents
-
-
-
-
-
- Introduction 1
- Notation 1
-
- The QSORT Command and Options 2
- The /<key_spec> Parameter 3
- The /F<len> Parameter 4
- The /T[<tag>] Parameter 4
- The /D[<fields>][<delim>[<term>]] Parameter 5
- The /R Parameter 6
- The /S Parameter 6
- The /? Parameter 7
-
- Lexical Sorting 8
-
- Examples 10
-
- Implementation Notes 12
- General Information 12
- Performance and DOS Configuration 13
- Performance and Input Record Type 15
- Performance and Sort Keys 16
- Performance and File Size 17
-
- About Shareware 18
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- i
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Introduction Introduction
-
-
- QSORT was first designed to be a replacement for, and to overcome
- the limitations of DOS SORT, but has been enhanced a number of
- times and moved to new compilers twice. The current version will
- sort files whose size is limited only by available disk space.
- File name(s) may be given explicitly or QSORT will sort from
- standard input to standard output, and so, may be used in pipes
- or with redirection. Multiple keys may be specified. Binary
- files with fixed-length records may be sorted, provided only that
- keys are ASCII character strings.
-
- QSORT tries to be very protective of your data. If QSORT has an
- error of any kind, it will terminate with the input file still
- intact, and will return to DOS with ERRORLEVEL = 1. When QSORT
- successfully completes a sort, it terminates with ERRORLEVEL set
- to 0.
-
- The command line syntax is a super-set of SORT's, so QSORT may be
- used without other changes in batch files using SORT, but in most
- cases you will probably want to make use of QSORT's greater capa-
- bilities.
-
- Notation Notation
-
- In defining the command line and its various parameters, the
- following notation is used:
-
- [optional] items are enclosed in square brackets. [optional]
-
- <variable> items appear in lower case and are enclosed in <variable>
- angle brackets. They are replaced by actual data such
- as a file name. The angle brackets are NOT typed.
-
- THIS | THAT Choices are separated by a vertical bar. THIS | THAT
- Select one or the other but not both.
-
- [THIS | THAT] When the choices are enclosed in square [THIS | THAT]
- brackets, you may also select neither.
-
- REPEAT. . . The ellipsis (. . .) means the item to its left REPEAT. . .
- may be repeated as many times as necessary.
-
-
-
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 2 QSORT Text Sorting Utility Page 2
-
-
- UPPER CASE items and all special characters not defined UPPER CASE
- above represent themselves. They are entered exactly
- as they appear.
-
-
-
- The QSORT Command and Options The QSORT Command and Options
-
-
- QSORT is invoked with the following command:
-
- QSORT [<in_file> [<out_file>]] [/<key_spec>. . .] QSORT [<in_file> [<out_file>]] [/<key_spec>. . .]
- [/F<len> | /T[<tag> | /D[[<fields>][<delim>[<term>]]] [/F<len> | /T[<tag> | /D[[<fields>][<delim>[<term>]]]
- [/R] [/S] [/?] [/R] [/S] [/?]
-
- Note that all parameters on the command line are optional. The
- <in_file> and <out_file> parameters are "ASCII-Z" file speci-
- fiers. They may contain disk and path information in the stan-
- dard DOS format, but must not contain "wild-card" characters. If
- <in_file> is missing, QSORT sorts from standard input to standard
- output. These are files defined and opened by DOS before QSORT
- is loaded. (See your DOS manual concerning the use of redirec-
- tion and pipes.)
-
- If <in_file> is given but <out_file> is missing, QSORT creates a
- temporary file in the directory containing <in_file> and sorts to
- the temporary file. On successful completion of the sort,
- <in_file> is deleted and the temporary is renamed to <in_file>.
- The effect is an apparent "sort-in-place."
-
- If both file names are given, <in_file> is unchanged and the
- sorted output is written to <out_file>. Note that the following
- two commands are exactly equivalent:
-
- QSORT FILE.TXT FILE.SRT QSORT FILE.TXT FILE.SRT
-
- QSORT <FILE.TXT >FILE.SRT QSORT <FILE.TXT >FILE.SRT
-
- In the first, QSORT opens the files. In the second, redirection
- is specified and DOS opens the files. The result is the same.
- It is an error QSORT can't detect if you mix these. For in-
- stance:
-
- QSORT FILE.TXT >FILE.SRT QSORT FILE.TXT >FILE.SRT
-
- will result in a sort-in-place. QSORT will open FILE.TXT but
- won't know DOS has opened FILE.SRT for it, and will ignore it.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 3 QSORT Text Sorting Utility Page 3
-
-
- The /<key_spec> Parameter The /<key_spec> Parameter
-
- Up to 30 /<key_spec> parameters may be used to specify sort keys
- and are ordered major to minor from left to right. The
- /<key_spec> argument has the form:
-
- /[L][+|-][<field>.][<col>][:<length>] /[L][+|-][<field>.][<col>][:<length>]
-
- Note that all elements of this argument are "optional," but at
- least one element must be present following the slant-bar (/).
-
- The 'L', if present, specifies "lexical" sequence for this key.
- Lexical sequence is ordered first by spelling, then, when keys
- have identical spelling, by capitalization.
-
- The minus (-) sign reverses the sorting order for this key, while
- the plus (+) sign (or no sign) specifies normal sort order.
-
- There are three numbers associated with every sort key; the field
- number, the starting column within the field, and the length of
- the key in characters. Any, or all of them may be given in a
- /<key_spec> parameter. QSORT uses punctuation to identify each
- number. A number followed by a period (.) is a field number. A
- number preceded by a colon (:) is a length number. A column num-
- ber has no punctuation associated with it. It follows the field
- number, if any, and precedes the length number, if any.
-
- The [<field.>] element is used only for "delimited-field"
- records, and locates this key within a particular field. The
- value of <field.> must be less than or equal to the number of
- fields defined with the /D parameter (see below). If [<field>.]
- is omitted when sorting delimited-field records, the first field
- is assumed. For consistency, all records are assumed to have
- "fields." In all cases except delimited-field records, there is
- precisely one field, and it spans the entire record.
-
- If present, [<col>] defines the beginning column of the key. If
- omitted, column 1 is assumed. In the case of delimited-field
- records, column 1 is the first character of the identified field.
- In all other cases, column 1 is the first character of the
- record.
-
- If present, [:<length>] defines the key length in columns (or
- characters). If [:<length>] is omitted, the rest of the record,
- or field in delimited-field records, is assumed to be part of the
- key.
-
- If no key parameters are given, the entire record, or the entire
- first field is the key.
-
- When sorting variable-length records, any key which begins beyond
- the end of its field in a particular record is treated as a null
- (zero length) key for that record, and will sort low relative to
- all records with non-null values for that key. When sorting
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 4 QSORT Text Sorting Utility Page 4
-
-
- fixed-length records, all defined keys must fall within the de-
- fined record length. <key_spec> parameters must appear in order
- of importance, primary key first.
-
- The /F<len> Parameter The /F<len> Parameter
-
- The /F<len> parameter denotes the record length for a file of
- fixed-length records. All records in the input file MUST be ex-
- actly <len> bytes long. The records need not (but may) be termi-
- nated with a CR/LF sequence. They may contain any data, even bi-
- nary data, but the keys must be ASCII strings. Strings may be
- terminated with a null (binary zero) character, or may be padded
- with trailing spaces to the full length of the key.
-
- Note that QSORT does not attempt to support Pascal style strings.
- These are strings which begin with a character whose binary value
- is a character count. This is followed by <count> characters of
- ASCII data, which in turn is followed by random data out to the
- maximum length of the string. These strings may be used as keys,
- but the programmer must insure that either the last real charac-
- ter is a null character, or the key is padded to its full length
- with spaces. QSORT must be told that the key begins in the sec-
- ond character position (the first character of real data).
-
- The /T[<tag>] Parameter The /T[<tag>] Parameter
-
- The /T[<tag>] parameter, if present, indicates that the "records"
- to be sorted may be more than a single line long.
-
- If <tag> is also present, it defines a character to be used to
- tag the "end-of-record." If <tag> is not present, the first
- empty line terminates the record. For this purpose, "empty"
- means "no characters." A line containing but a single space is
- NOT empty! A line may be "tagged" by placing the <tag> character
- anywhere on the last line of a logical record. The entire line,
- including the tag character will appear as the last line of the
- record.
-
- Some characters cannot be used to represent themselves in a DOS
- command line. For that reason, QSORT uses codes to represent
- them. These codes are actually a pair of characters. The first
- is always a back-slash (\). The second character identifies the
- special character it represents. The following is a table of
- characters recognized by QSORT:
-
- \B - Space character 20 Hex 32 Decimal
- \F - Form feed character 0C 12
- \L - Line feed character 0A 10
- \N - Newline sequence
- \R - Carriage return 0D 13
- \T - Tab character 09 9
- \\ - Back-slash character itself
-
-
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 5 QSORT Text Sorting Utility Page 5
-
-
- Thus an invisible tab character might be used to end a multi-line
- logical record. The other characters in this code list don't
- make much sense in this context, but will be useful in the /D
- parameter (see below).
-
- Note that the /F<len> and /T[<tag>] parameters are incompatible,
- and may not both be specified.
-
- The /D[<fields>][<delim>[<term>]] Parameter The /D[<fields>][<delim>[<term>]] Parameter
-
- The /D[<fields>][<delim>[<term>]] parameter, if present, states
- that this file contains delimited-field records. In other words,
- a record is made up of distinct, variable length fields separated
- from one another by a particular character, or "delimiter."
- Records are separated, or "delimited" by the "newline sequence."
-
- The <fields> element defines the number of variable length fields
- contained in each record. All fields must be present in each and
- every record. A "null" field will be represented by two succes-
- sive delimiter characters. There must always be exactly <fields>
- minus 1 delimiter characters in a record.
-
- If a <delim> character is present, QSORT uses it as a field de-
- limiter character. Otherwise a comma (,) is assumed to be the
- delimiter.
-
- If a <term> character is also present, QSORT uses it as a record
- delimiter character. In fact, it literally redefines the
- "newline sequence" to QSORT. More on this in a moment.
-
- The same character codes listed under the /T parameter may be
- used to represent these characters. Note that "\N" means the
- "newline sequence." If <term> is not present, this is the CR-LF
- character pair. If <term> is present, it represents the <term>
- character. Thus:
-
- /D3\N\T /D3\N\T
-
- says that the records are separated by tab characters, and that
- the three fields within each record are also separated by tab
- characters. On the other hand:
-
- /D3\N /D3\N
-
- says that each group of three lines constitutes one logical
- record, and each line is a field within that record.
-
- The /D parameter is always incompatible with the /F parameter,
- and usually incompatible with the /T parameter, but there is an
- exception when <fields> is missing, or is equal to 1.
-
- If <fields> equals 1 (or is missing) it says that there is only
- one field spanning the entire record. But that is what QSORT as-
- sumes if the whole /D parameter is missing! So why bother?
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 6 QSORT Text Sorting Utility Page 6
-
-
- In most ASCII files a "line" ends with a carriage return charac-
- ter (CR) followed by a line feed character (LF). QSORT searches
- for this character pair when it is looking for a "newline se-
- quence."
-
- But not all files use CR-LF as a line terminator. For instance,
- files imported from UNIX or XENIX usually terminate lines with a
- naked line feed character! And some editors produce files whose
- lines end in a naked carriage return character! So:
-
- /D,\L /D,\L
-
- says "for this file, the newline sequence is a single line feed
- character." In this case, the comma is a place holder. There
- really is no "delimiter character," but one must be present in
- the parameter in order to define the <term> character.
-
- The /R Parameter The /R Parameter
-
- The /R parameter is included for compatibility with DOS SORT and
- is redundant. It reverses the sense of sort direction for all
- sort keys.
-
- The /S Parameter The /S Parameter
-
- The /S parameter tells QSORT to make a statistics report to the
- screen at the end of a run. The report is written to the
- "standard error" device, the console, and may not be redirected.
- The following is an actual statistics report "cut" from the
- screen after QSORT had sorted a 1.3+ megabyte file:
-
- -----------------------------------------------------------------
- QSORT - Text Sort - Version 3.10
- Copyright (c) 1985, 1986, 1987 - Ben Baker - All rights reserved
-
- 12115 records sorted
- 150 bytes in longest record
-
- 127131 sort phase comparisons
- 73232 merge phase comparisons
-
- 200363 total comparisons
- 16.5 comparisons per input record
-
- 27 temporary merge files created
- 2 merge passes
- 2.4 average passes over data
-
- 2:51 elapsed time
- -----------------------------------------------------------------
-
- The first two numbers are self-explanatory. The next two are the
- number of times two records were compared during the sort phase
-
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 7 QSORT Text Sorting Utility Page 7
-
-
- and the merge phase respectively, followed by the total compar-
- isons.
-
- The next number is total comparisons divided by the number of
- records in the input file. If there is no merge phase, this num-
- ber is typically 10 to 12. If the file is large enough to re-
- quire merging, it is 12 to 20, on average. If it is much larger
- than 20, it usually means that there is something unusual about
- your input file. It may already be sorted, or there may be large
- blocks of records which compare equal. This can happen if you
- sort on, say column 50 and the input file contains a large number
- of records shorter than 50 bytes. In this case, a minor sort key
- at column 1 may significantly speed sorting.
-
- The next two items are self-explanatory. "Average passes over
- data" reflects the number of times each record was read and writ-
- ten. For short files not requiring a merge pass, this number
- will be 1.0. When merging is needed, the last merge pass is the
- one which writes the output file and it must read and write every
- record exactly once. Thus when only one merge pass is made,
- there will be exactly 2.0 "average passes over data." In the
- above case the first merge pass processed about 40% of the
- records, hence the value of 2.4.
-
- The above sort was performed on an Zenith 248, an eight megahertz
- AT clone with two hard drives (C and D). The input file was on
- C; the temporary merge files were placed on D; and the output
- file was written to C. The sort of a 1.3 megabyte file took un-
- der three minutes. The same sort on an XT should take about
- seven minutes.
-
- The /? Parameter The /? Parameter
-
- The /? parameter requests help or parameter evaluation. When
- QSORT is executed with the /? parameter alone, it lists a short
- description of the QSORT parameters. If /? is entered as one of
- several parameters, QSORT will produce a short report on the
- screen describing the sort it would perform based on those param-
- eters without actually doing a sort.
-
- For example:
-
- QSORT /? /L5:12 /-3:2 /22 /T /R <INFILE.TXT >OUTFILE.TXT QSORT /? /L5:12 /-3:2 /22 /T /R <INFILE.TXT >OUTFILE.TXT
-
- requests a sort on a file of tagged records. It defines three
- sort keys, one of them inverted (-). The /R parameter reverses
- the sense of all three keys. Since redirection is specified,
- QSORT does not see or know about the names of the files it will
- sort. The /? parameter requests a report, rather than a sort,
- and QSORT obligingly produces on the screen, the following:
-
-
-
-
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 8 QSORT Text Sorting Utility Page 8
-
-
- -----------------------------------------------------------------
- QSORT - Text Sort - Version 3.10
- Copyright (c) 1985, 1986, 1987 - Ben Baker - All rights reserved
-
- With the present arguments, QSORT would sort from STDIN to STDOUT
- Records are multiple lines ending with an empty line
-
- Key fields in descending order of importance are:
- Field Pos Len Type
-
- 1 5 12 Lexical Descending
- 1 3 2 ASCII
- 1 22 65535 ASCII Descending
- -----------------------------------------------------------------
-
- The third key has an unspecified length. The value "65535"
- merely means that this key extends to the end of each record.
-
- The /M<len> supported in earlier versions of QSORT is no longer
- required, but will be accepted (and ignored) by QSORT. There is
- no "hard-coded" maximum record length in QSORT, but there is a
- practical limit. At some time during every sort, the two longest
- records in the input file must be compared. Therefore, the two
- longest records must be able to fit together in the sort buffer.
- The sum of their lengths cannot exceed about 50K -- not an alto-
- gether unreasonable limitation. QSORT can be shoe-horned into
- tighter memory and will run if it can find 4K for a sort buffer
- and 4K for an output buffer, but the two longest records must
- still fit in the sort buffer together.
-
- Arguments may appear in any order on the command line except that
- <in_file> must appear before <out_file>, and /<key_spec> argu-
- ments must appear in descending order of importance.
-
-
-
- Lexical Sorting Lexical Sorting
-
-
- The lexical sorting capability was borne out of my own need to
- sort word lists with mixed capitalization. ASCII sequence pro-
- duced some bizarre results when words beginning with 'Z' sorted
- before those beginning with 'a.' Case-insensitive sorting wasn't
- much better because upper and lower case got mixed randomly.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 9 QSORT Text Sorting Utility Page 9
-
-
- The following table will illustrate what I mean:
-
- INPUT ASCII CASE LEXICAL
- INSENSITIVE
-
- DeLaPort Baker Baker Baker
- Smith Brown brown Brown
- brown DeAngelo bRown bRown
- deLaPorte DeLaPort Brown brown
- Deangelo Deangelo Deangelo DeAngelo
- deAngelo Deangelo deangelo Deangelo
- Brown DelaPort Deangelo Deangelo
- smith DelaPorte deAngelo deAngelo
- delaPorte Harry DeAngelo deangelo
- DelaPort Smith delaPort DeLaPort
- DeAngelo bRown DelaPort DelaPort
- DelaPorte brown delaPort delaPort
- deangelo deAngelo DeLaPort delaPort
- Harry deLaPorte DelaPorte DelaPorte
- delaPort deLaPorte deLaPorte deLaPorte
- Baker deangelo delaPorte deLaPorte
- deLaPorte delaPort deLaPorte delaPorte
- Deangelo delaPort Harry Harry
- bRown delaPorte smith Smith
- delaPort smith Smith smith
-
- The first column is a list of names in arbitrary order. The sec-
- ond is an ASCII sort of that list. Third we have one possible
- case-insensitive sort of the list. The fourth column is what I
- really wanted. It is sorted the way these words would be sorted
- in a dictionary (or lexicon). The third and fourth columns both
- collect words of identical spellings together, but in the third
- column, upper and lower case spellings are in arbitrary order,
- while the fourth column places upper case spellings ahead of
- lower case spellings.
-
- For example, the two occurrences of Smith are widely separated in
- column 2 because one is capitalized and the other is not. Column
- 3 brings the two together, but in the wrong order. They might
- have been in the right order, but the order is strictly arbi-
- trary. In column 4, Smith comes before smith, and lexical sort-
- ing will always put them in this order.
-
- Lexical sorting is achieved by making case-insensitive compar-
- isons of entire keys. If the keys compare equal, an ASCII com-
- parison is made to arbitrate ties. In other words, when
- "lexical" keys in two records have different spellings, the case-
- insensitive comparison determines the order of the records. When
- "lexical" keys are spelled the same, the case-sensitive compari-
- son determines the order of the records.
-
- Lexical keys are defined, as indicated above, by placing the let-
- ter 'L' immediately following the slant-bar (/) in <key_spec>
- definitions.
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 10 QSORT Text Sorting Utility Page 10
-
-
- Lexical sorting can be very useful when needed, but be aware that
- unnecessarily specifying lexical ordering may degrade performance
- of QSORT.
-
-
-
- Examples Examples
-
-
- Produce a sorted directory listing and display it on the console
- a screen's worth at a time:
-
- A>DIR | QSORT | MORE A>DIR | QSORT | MORE
-
- This demonstrates the use of QSORT as a "filter" in a "pipe."
-
-
-
- Produce a directory listing sorted by creation date and time, and
- display it on the console a screen's worth at a time:
-
- A>DIR | QSORT /30:2 /24:5 /39 /34:5 | MORE A>DIR | QSORT /30:2 /24:5 /39 /34:5 | MORE
-
- The output of the DIR command is piped to QSORT. The keys de-
- fined are, from left to right (major to minor), year (2 digits),
- month and day, AM/PM flag and time. The output of QSORT is then
- piped to MORE for display.
-
-
-
- Next, replace the unsorted FILE.TXT with the same data sorted in
- reverse order. Use columns 10 to 16 as the sort key:
-
- >QSORT FILE.TXT /-10:7 >QSORT FILE.TXT /-10:7
-
- or or
-
- A>QSORT FILE.TXT /10:7 /R A>QSORT FILE.TXT /10:7 /R
-
- or (SORT compatible) or (SORT compatible)
-
- A>QSORT FILE.TXT /R /+10 A>QSORT FILE.TXT /R /+10
-
- Next, perform a simple sort on a file with up to 240-byte records
-
- A>QSORT LARGE.REC /M240 A>QSORT LARGE.REC /M240
-
- or or
-
- A>QSORT LARGE.REC A>QSORT LARGE.REC
-
- Note that the "/M240" parameter is no longer needed, but will
- not hurt.
-
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 11 QSORT Text Sorting Utility Page 11
-
-
-
-
- GLOSS.TXT is an unsorted glossary of terms. The term being de-
- fined by each entry appears first, followed by several lines of
- definition. The entries are separated by empty lines. Produce
- GLOSS.SRT, a sorted version of the glossary:
-
- A>QSORT /T <GLOSS.TXT >GLOSS.SRT (with redirection) A>QSORT /T <GLOSS.TXT >GLOSS.SRT (with redirection)
-
- or or
-
- A>QSORT /T GLOSS.TXT GLOSS.SRT (without redirection) A>QSORT /T GLOSS.TXT GLOSS.SRT (without redirection)
-
-
-
- A lawyer keeps a running log of his billable activities in
- TIME.LOG. The first line of each entry is "mm/dd/yy hh:mm ac-
- count#." He always places a tilde (~) in the last line of each
- entry. He wishes to sort the log by account number, and by as-
- cending date and time within each account:
-
- A>QSORT /16:7 /7:2 /1:5 /10:5 /T~ TIME.LOG A>QSORT /16:7 /7:2 /1:5 /10:5 /T~ TIME.LOG
-
-
-
- The directory of users for a bulletin board system is kept in a
- binary file of fixed-length records 180 bytes long. The user
- name is a 26-character field beginning in the first position and
- the city/state field is a 16-character field beginning in the
- fortieth position. Sort the file by city/state and name.
-
- C>QSORT /F180 /40:16 /1:26 USER.BBS C>QSORT /F180 /40:16 /1:26 USER.BBS
-
-
-
- DB.TXT is a delimited field output file from dBASE III, Each
- record contains 7 fields, delimited by commas. Sort the file to
- the screen using field 3 as a sort key.
-
- C>QSORT /D7 /3. <DB.TXT C>QSORT /D7 /3. <DB.TXT
-
- Here, "standard input" has been redirected to the file. Since no
- redirection is given for "standard output," DOS assigns it to
- the console by default. This is NOT a sort-in-place!"
-
-
-
- You have received a member list from the Society of End-users of
- XENIX (SEX.LST). Sort the list by special interest (10 columns
- beginning at 70) and name (30 columns beginning at 1). Note that
- the file contains no carriage return characters.
-
- C>QSORT SEX.LST /70:10 /1:30 /D,\L C>QSORT SEX.LST /70:10 /1:30 /D,\L
-
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 12 QSORT Text Sorting Utility Page 12
-
-
- The /D parameter is used to redefine the newline sequence as a
- naked line feed character.
-
-
-
- The file LABEL.TXT contains mailing label images. Each label is
- 6 lines (1 inch) high. Line six is always empty and line three
- is frequently empty. An extended Zip code always begins in col-
- umn 20 of line 5, and extends to the end of the line. In order
- to take advantage of bulk mailing rates, the labels must be
- sorted into carrier route (CRT) order.
-
- QSORT LABEL.TXT /5.20 /D6\N QSORT LABEL.TXT /5.20 /D6\N
-
- We must use a "delimited field" sort rather than a "tagged line"
- sort for two reasons: 1) Line six is empty, not tagged with a
- special character. When line three is also empty a label would
- be broken into two pieces and separated by the sorting process.
- 2) Our sort key is not at any known offset from the beginning of
- the label. Its position is fixed only relative to line five.
-
-
-
- Implementation Notes Implementation Notes
-
-
- General Information General Information
-
- QSORT is intended as an enhanced replacement for DOS SORT. It is
- nearly fully upward compatible, but provides much more flexi-
- bility. Multiple sort keys may be specified, a pseudo in-place
- sort may be performed and files and/or records of any size may be
- sorted provided only that there is sufficient disk space for work
- files and the output file. QSORT uses the "quick sort" algo-
- rithm, which cannot guarantee the order of records whose keys are
- all equal. This is the one "incompatibility" with DOS SORT,
- which retains the original order of records when its only key
- compares equal. This is important to SORT because it must be in-
- voked multiple times to effect a multiple key sort. With QSORT,
- you only sort once and there are usually enough keys available to
- insure you get the order you want the first time.
-
- QSORT uses a sort buffer of about 50K bytes and will fill the
- buffer as full as possible, and then sort its contents. If the
- end of the input file has been reached and no temporary work
- files have been generated, the sorted contents of the buffer are
- written to the output file, completing the sort operation.
-
- If the input file is too large to fit into the sort buffer, as
- much of the input file as possible is read into the buffer,
- sorted, then written to a temporary work file. This process is
- repeated as many times as necessary to process the entire input
- file, each time creating a new work file for the sorted output.
-
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 13 QSORT Text Sorting Utility Page 13
-
-
- Upon completion of the "sort phase," QSORT begins a "merge
- phase." Each work file is a sorted sub-set of the input file.
- Thus, work files may be read sequentially and combined to produce
- a sorted output. QSORT will open as many work files as DOS per-
- mits (more on this later). If all the remaining work files can
- be opened, the sorted result is written to the output file.
- Otherwise, a new work file is created and another merge pass will
- be required. On each merge pass, the number of work files is re-
- duced and eventually all remaining work files will be opened and
- the sorted output file will be written completing the sort oper-
- ation.
-
- Performance and DOS Configuration Performance and DOS Configuration
-
- QSORT is smart enough to never have just one work file remaining,
- which would require an unnecessary copy operation. In fact,
- QSORT is smarter than just that in its handling of the merge
- phase. If more than one merge pass is required, all the data
- merged during the first pass will have to be merged again, so
- QSORT attempts to minimize the first pass. For example, if QSORT
- discovers it may only open 15 files at a time, and there are 16
- temporary files, it will only merge two files on the first pass,
- creating a 17th file as it does. Then in the second pass, it
- will merge all 15 remaining files to the output file. The less
- data it processes twice, the faster it performs the sort!
-
- With nothing else to guide it, QSORT places its temporary files
- in the default directory. Either of two "environment variables"
- can override this. (See your DOS manual for information on envi-
- ronment variables and the SET command.) The DOS command:
-
- SET QSTMP=<path> or SET QSTMP=<path> or
-
- SET TMP=<path> or SET TMP=<path> or
-
- SET TEMP=<path> SET TEMP=<path>
-
- will define a path for QSORT to use for its temporaries. QSORT
- first looks for the environment variable QSTMP. If it does not
- exist, QSORT next looks for TMP or TEMP in that order. TMP and
- TEMP are de facto standards used by many programs, and are usu-
- ally defined in your AUTOEXEC.BAT batch file. You might have TMP
- specifying a 64K RAM disk to speed up your compiler. In this
- case, an attempt to sort a 100K file is doomed to failure.
- Rather than redefine TMP, you may define QSTMP to force QSORT to
- use some directory on your hard disk. In fact:
-
- SET QSTMP=\ SET QSTMP=\
-
- tells QSORT to always use the root directory of the default
- drive!
-
- QSORT, to work properly, needs enough space on the output disk to
- hold the output file. Even if the input file is to be deleted
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 14 QSORT Text Sorting Utility Page 14
-
-
- and resides in the same directory, that is not done until after
- the output file has been successfully written. If one merge pass
- is required, the disk space QSORT uses for temporary merge files
- will be about 10% larger than the size of the input file. If
- more than one merge pass will be required, allow about twice the
- size of the input file as temporary merge file space.
-
- One of the advantages of controlling where QSORT places its tem-
- porary files is to insure adequate space for them. A second is
- speed. If the temporary files can be placed on a separate disk
- from the input and output files, disk seeking is minimized and
- performance improved.
-
- Each time QSORT must create a new temporary merge file, the data
- put into it will be processed again. Obviously, the more files
- QSORT can open during the merge phase, the fewer times it will
- have to handle each record and the faster it can sort large
- files. If DOS is properly pre-conditioned, QSORT can have up to
- 15 temporary merge files open at once, and very large files can
- be sorted with just one sort pass and one merge pass. Unfortu-
- nately, that capability is not automatic.
-
- DOS has a fixed number of file "handles" that it associates with
- open files. The default number is eight, but DOS opens five of
- them for standard input, standard output, standard error, stan-
- dard printer and standard auxiliary device. That leaves three
- for merging. A 250K input file would produce five temporary
- merge files and that would take three merge passes; merge two
- into one, leaving four; merge two into one leaving three; and fi-
- nally merge three into the output file. In the process, QSORT
- must read and write about 80% of the file twice during the merge
- phase.
-
- Worse yet, since you need at least three handles for merging, if
- you have resident programs that have open files, you can't merge
- at all!
-
- DOS can be told to set aside more space for file handles. Each
- handle is only 39 bytes and it's memory very well spent. One
- process can have a maximum of 20 handles open at one time, but
- since resident processes may be using handles, I recommend 25 to
- 35. To do this, the root directory of the DISK OR DISKS YOU BOOT
- FROM must contain a file named CONFIG.SYS. If your boot disk(s)
- already contains a CONFIG.SYS, edit it, or if not, create it to
- contain the following line:
-
- FILES=25 (or more) FILES=25 (or more)
-
- While we're at it, let's add one more thing to CONFIG.SYS which
- will improve the performance of QSORT and many other programs as
- well. DOS provides, by default, two disk buffers. These are the
- buffers it uses to do its disk reads and writes. During the
- merge phase QSORT may have many files open at once, reading from
- them in more or less random order. DOS may have to read the same
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 15 QSORT Text Sorting Utility Page 15
-
-
- physical sector several times to get all its data. But DOS can
- remember what's in each buffer and where it came from, and will
- not re-read a sector it already has in a buffer. DOS needs 528
- bytes for each buffer. I recommend 20 buffers to make QSORT per-
- form well under the most adverse conditions. This will require
- an additional 9504 bytes or slightly more than 9K, again memory
- well spent, so we add to CONFIG.SYS the following line:
-
- BUFFERS=20 BUFFERS=20
-
- See your DOS manual for more information on CONFIG.SYS.
-
- Performance and Input Record Type Performance and Input Record Type
-
- QSORT must read and parse logical records before sorting them,
- then reassemble them before final output. The type of records
- contained in the file being sorted determines how much work this
- requires, and therefore has an impact on performance.
-
- The present version of QSORT can handle four record types: simple
- ASCII, tagged ASCII, delimited field ASCII and fixed length, de-
- termined by the presence or absence of a /T, /D or /F parameter
- on the command line.
-
- Fixed length records are very structured and require no parsing.
- Other things equal, files of fixed length records will sort the
- fastest.
-
- When parsing simple ASCII records, QSORT must find and mark the
- newline sequence, then restore it for final output. In general,
- this is relatively fast, but is affected by line length. In par-
- ticular, lines containing "over-strikes" (naked CR characters
- followed by more data) can significantly slow down the parsing.
-
- Tagged ASCII records are parsed in a fashion similar to simple
- ASCII records, if a tag character is defined. First the tag
- character is found, then the next newline sequence is found and
- marked. The time required is of course dependant on the total
- length of the logical record, but is fairly fast. If no tag
- character is defined, two successive newline sequences must be
- found. This depends not only on total length, but the number of
- lines contained in a logical record.
-
- To parse a delimited field record with n fields, n minus one de-
- limiters must be found and marked, then the newline sequence must
- be found and marked. It is similar to tagged records with no de-
- fined tag character, but because records of this type are usually
- shorter than tagged records, parsing delimited field records may
- be a little faster. It is certainly slower than parsing simple
- ASCII records.
-
-
-
-
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 16 QSORT Text Sorting Utility Page 16
-
-
- Performance and Sort Keys Performance and Sort Keys
-
- The sort keys defined on the command line have a lot to do with
- QSORT's performance. There isn't much you can do in the way of a
- strategy you can use when you need a particular file sorted in a
- particular way, but you should at least be aware.
-
- Several decisions must be made in comparing two records. Which
- field contains the current key? Is the field long enough to con-
- tain the key in one, both or neither record? Are the keys lexi-
- cal or ASCII? If the answers to any of these questions will
- remain constant over the course of a sort run, they should be
- answered once, not several thousand times!
-
- QSORT has ten record comparison routines varying in degree of
- complexity. At the beginning of each sort run it selects the
- simplest one possible, based on the parameters given, to be used
- throughout the run.
-
- If no sort key parameters are given, the entire record is used as
- a key. The compare routine has no decisions it must make -- it
- simply compares the two strings handed it. This is the "simple
- sort," and is the fastest possible case.
-
- A sort key that does not begin at the beginning of a variable
- length record, may not be contained in a particular record at
- all, while a fixed length record is known to contain all keys.
- Other things equal, files of fixed length records will sort some-
- what faster because the compare routine does not have to test for
- "key containment."
-
- Lexical keys are first compared with a "case insensitive" tech-
- nique. Each character is tested to see if it is alphabetic. If
- it is, it is converted to upper case. Then a converted character
- from each record is tested. This is obviously slower than di-
- rectly comparing two characters. In the event lexical keys com-
- pare equal, they are compared again using a direct compare tech-
- nique! Files with lexical keys sort slower than similar files
- without them.
-
- In the case of files with delimited field records, the compare
- routine must find the correct field for each key, determine if
- the keys are contained within the fields, and finally compare
- them. The added step of searching for fields slows record com-
- parison.
-
- In general, the more complex the data, the more complex the sort-
- ing task and the longer it will take. QSORT attempts to optimize
- its performance by making as many decisions as it can about your
- data up front, then selecting a compare routine that makes only
- the necessary decisions on a record-by-record basis.
-
-
-
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 17 QSORT Text Sorting Utility Page 17
-
-
- Performance and File Size Performance and File Size
-
- I received a letter from someone which included a graph showing
- QSORT's performance in sorting time vs. file size. He said he
- had expected an exponential, or at least a logarithmic curve.
- Instead time increased linearly with file size. I admit it puz-
- zled me at the time, but Codeview, Microsoft's debugger, made it
- ease for me to measure the performance of the various parts of
- the QSORT program. It turns out that actual sorting of data ac-
- counts for a very small percentage of QSORT's running time. It
- spends most of it's time doing I/O. For files up to about 50
- kilobytes, it will read and write each record once. From 50K to
- about 750K there will be one merge pass and each record will be
- read and written twice. Since the amount of I/O increases lin-
- early over this range of file sizes, so will sorting time.
-
- Above about 750K a second merge pass will be needed, but in this
- size range, only seven to ten percent of the data will be pro-
- cessed in the first merge pass, so the sort time vs size curve
- will steepen slightly, but will not experience a large step (as
- it did in versions 1 and 2). Doubling the file size to 1.5
- megabytes should increase the sort time about three times.
-
- Sorting time will be approximately proportional to file size
- times the "average passes over data" number from the statistics
- report. Since this number remains a constant "2.0" over a wide
- range of file sizes, sorting time will be a linear function of
- file size in that range.
-
-
-
- I hope you find this program useful. Your comments and sugges-
- tions are welcome. My address is in the next section.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- QSORT Text Sorting Utility Page 18 QSORT Text Sorting Utility Page 18
-
-
- About Shareware About Shareware
-
-
- QSORT is made available under the "shareware" concept. Shareware
- products are distributed freely and publicly. You are invited to
- "test drive" them without cost. But shareware is NOT FREE! If
- you use a product, you are expected to pay a fee for its use.
- Because overhead costs are minimal, this fee is usually a frac-
- tion of the normal commercial price the product might carry, but
- it is NOT zero!
-
- If you find this program useful, you are asked to send its author
- a license fee of $20 for each machine on which you use it. This
- will encourage the development of other useful, affordable tools.
-
- QSORT may be freely copied and distributed. provided that 1) it
- is distributed under the name "QSORT," and 2) this documentation
- file always accompanies it. Vendors wishing to distribute QSORT
- commercially, or with commercial products may contact the author
- at the address below for terms.
-
- Send checks to:
-
- Ben Baker
- Baker's Acre
- RR #1, Box 637
- E. Alton, Il 62024
-
- Bug reports or suggestions may be sent to the above address or
- via FidoNet mail to Fido node # 100/76, Baker's Acre, or by log-
- ging into Baker's Acre BBS at 618-251-2169.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-