home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
High Voltage Shareware
/
high1.zip
/
high1
/
DIR2
/
BSORT102.ZIP
/
BIGSORT.DOC
next >
Wrap
Text File
|
1993-10-04
|
24KB
|
458 lines
BIGSORT(tm)
Version 1.02
Written By: David Sheppard Poor
Copyright (C) 1993: MeadowBrook Industries, Ltd
ALL RIGHTS RESERVED
DESCRIPTION: A utility to sort files:
* Sorts small to extremely large files
* Customized and complex key structures
* FAST!
* Fixed length and variable length records
* Supports comma and other delimited file formats
* Supports dBASE and compatible file formats
* Up to eight input files with a single output file
ON-LINE HELP: Enter: BigSort /?
EXECUTION:
BigSort InputFile OutputFile [Format [Key[+Key2[+Key3..]]]]
Note that only the first two parameter fields are required.
INPUT FILE: Should contain all the data you want to sort. If an input
file is not in the current directory, explicitly give the path
(e.g. C:\DATA\DATAFILE.TXT). Up to eight files can be combined as
multiple input files, with the output going to a single output
file. Wildcards (? and *) may be used, and multiple file
specifications can be combined with a plus (+) immediately between
each input file spec (e.g. C:\USER\YOU\*.DAT+C:\DATA\*.DAT). Note
that if the format of the input file is dBASE, there can only be
one input file.
OUTPUT FILE: All data from the input file(s) will be written to this file.
FORMAT: Describes the format of the input and output data files.
CRLF: Standard variable length records. (Default)
Custom Variable: Any variable length records. (See "FORMAT")
FIXED: Standard fixed length records.
Custom Fixed: Any fixed length records. (See "FORMAT")
COMMA: Standard comma delimited records.
Custom Delimited: Any delimited records. (See "FORMAT")
dBASE: For dBASE III and IV records.
KEY: The key defines what part of each record will be used to sort the
records. The key is made up of segments of the form:
RecordPosition[(FieldLength)][/A|/D][/S|/I][/L|/R]
Each segment consists of the following elements:
RecordPosition Starting position in the record.
FieldLength Number of byte for the key segment.
Switches Sets case sensitivity and ordering.
FORMAT: The Format tells BigSort(tm) what kind of input to expect. There are
four basic flavors:
Fixed: All records are exactly the same number of bytes. This is
generally the most speed-efficient record format.
Variable: Records may differ in length, but each ends with a
characteristic "end-of-record" marker. This is the default.
Delimited: Variable length records, where each part of the record
is split into fields. The delimiter separates these fields.
dBASE: Records compatible with dBASE III or dBASE IV.
FIXED: Specifies the number bytes in each record is the same, and each
record has an end-of-record marker of CR (carriage return) followed
by LF (line feed). BigSort(tm) will automatically figure out the
length of the Fixed records in the input file. To use, enter the
format parameter as be "FIXED". Since sorting fixed length records
is faster than variable length records, specify FIXED records
instead of CR/LF whenever applicable.
Custom Fixed: Specifies the number of bytes per record, which includes
the end-of-record marker (e.g. if the record ends in CR/LF, be sure
to include the two bytes in the record length). To use, set the
format to be a decimal number, such as "200".
CR/LF: Specifies the precise CR and LF byte(s) which mark the end of
each record. To use, set the format parameter to be CR, LF, CRLF,
LFCR, etc. The CR/LF end of record marker can be used for fixed
length records and/or variable length records. Note that many word
processors, editors, etc. use CRLF to denote the end of a line.
Custom Variable: Specifies the precise byte(s) used to mark the end of
each record, in HEX form. This format begins with a pound sign (#)
and is followed by the hex representation of the bytes denoting the
end of each record. This form can be used for the same purpose as
CR/LF (where CRLF would be represented as #0D0A) as well as for
more specialized markers (e.g. #00 for null-terminated records).
COMMA: Specifies the records are comma delimited. This means each field
of the record is separated by a comma, text fields are enclosed by
double quotes, and each record is terminated by CRLF. To use,
enter the format parameter as "COMMA". Note that this is the type
of data file commonly created by programs written in BASIC.
Custom Delimited: For delimited records, other than standard COMMA
delimited. This format is comprised of a dollar sign ($), the hex
representations of the delimiter and text marker, a colon (:), and
a end-of-record marker. For example, the standard COMMA delimited
format can be represented as "$2C22:0D0A", where the delimiter is a
comma, the text marker is a double quote, and the end-of-record
marker is CRLF. This format is seldom used.
dBASE: For records compatible with the dBASE III or IV file formats.
Note that only one dBASE file can be sorted at a time. To use,
enter the format as "dBASE".
KEYS: The key specifies how the file should be sorted. The default key is
"1(100)/L/A/S", i.e. the key starts in position 1, is 100 bytes long, is
read from left to right, is in ascending order, and is case sensitive.
If a key is specified, each segment of the key is specified by fields in
the form: KeyPos[(KeyLen)][/L|/R][/A|/D][/S|/I]. Each segment can be up
to 255 bytes, and the total length of all segments may not exceed 1024
bytes. When a segment is specified, the first field (the KeyPos) is
required, and the others may be omitted if the defaults are appropriate.
KEYPOS: For Fixed and Variable records:
The segment's starting position in the record. (ie.if the
KeyPos is 10, the segment starts at the 10th byte.)
For Delimited and dBASE records:
The field number, where the first field is number 1. (ie. if
the KeyPos is 10, the segment would be the 10th field.)
KEYLEN: For Fixed and Variable records:
Number of bytes for the key segment. If the records are of
variable length and the key goes beyond the end of the record,
#00 is automatically used to fill in the blank space.
For Delimited records:
In the case of numeric fields, always set the KeyLen to 0.
For non-numeric fields, KeyLen refers to the number of bytes
to use from the field, not including the text markers.
For dBASE records:
Do not specify a KeyLen. the length of the segment will be
set automatically, as defined by the input dBASE file.
NOTE: If a KeyPos is given without a KeyLen, the KeyLen defaults to 1.
SEQUENCE: The /L and /R switches denote Left-to-Right and Right-to-Left
ordering. The default is /L. Use /R when, for example, you are sorting
a binary word, and you need to reverse the order of the hi and lo bytes.
(e.g. 20(2)/R). Note that the KeyPos is still the left-most byte when
using the /R switch. (e.g. the key will use byte 21 followed by 20).
ORDER: The /A and /D switches denote ascending or descending order. The
default is /A. Use /D when, for example, you want the largest number to
come first in the list. (eg. 10(3)/D). Note that in the example the
first three digits starting in position 10 are used.
CASE SENSITIVITY: The /S and /I switches denote Case Sensitive and Case
Insensitive (or Ignore Case) ordering. The default is /S. Using a key
of 1(15)/I will put Michigan before MISSISSIPPI. The key 1(15)/S will
put MISSISSIPPI before Michigan. Be sure to always use the /S switch
when sorting binary numbers. Also, using the /I may degrade the
efficiency of bigsort(tm) up to 25%. See "Suggestions and Technical
Notes" below for more on how to improve efficiency.
SEGMENTS: The key may be broken into segments when the sorting criteria is
spread throughout the record. To do this, put a plus (+) between each
segment. Note the second segment will only be used if the first
segments match exactly. (e.g. A key such as 100(10)+200(5) will sort
starting at position 100 for 10 bytes. If two records have the exact
same 10 bytes, the 5 bytes at position 200 will serve as a tie breaker.
EXAMPLES: THE BASICS:
There are three sample files included. They are:
Sample.TXT Standard ASCII version of the sample data
Sample.DEL Comma delimited version of the sample data
Sample.DBF dBASE version of the sample data.
The sample data includes records of states names, capitals, their postal
abbreviations, and state populations.
To look at the input file, type:
TYPE SAMPLE.TXT | MORE
Note that the "|" character is not a colon; it is a vertical bar. If
you are working on a portable computer keyboard, you may not have access
to this character.
After trying an example, look at the output by typing:
TYPE OUT.DAT | MORE
To look at the output of the dBASE samples, you must use a database
package.
1) To sort by state name, use:
BigSort SAMPLE.TXT OUT.DAT (Defaults)
-or- BigSort SAMPLE.TXT OUT.DAT CRLF 1(14) Format = CRLF
-or- BigSort SAMPLE.TXT OUT.DAT FIXED 1(14) Format = Fixed
-or- BigSort SAMPLE.TXT OUT.DAT 43 1(14) Format = Fixed 43
The output from any of these will be the same. Note that the fixed
formats are much faster. Since the default is CRLF with a key of
100, it is often good to specify FIXED when you can.
2) To sort by population, use:
BigSort SAMPLE.TXT OUT.DAT FIXED 34(8)
Note that numbers can be sorted when they are right justified.
3) When the populations are the same, use state names to break the tie:
BigSort SAMPLE.TXT OUT.DAT FIXED 34(8)+1(14)
Note: when two keys match exactly for the whole key, the one which
appeared first in the input file will appear first in the output
file.
4) If you have two files with the same format, they can be sorted together.
Since there is only one text example file included, we will use it twice
for demonstration purposes:
BigSort SAMPLE.TXT+SAMPLE.TXT OUT.DAT FIXED 1(14)
The output will have two records for each state.
EXAMPLES: COMMA AND dBASE:
Both dBASE and delimited forms are based on fields, as opposed to fixed
positions.
1) To sort by state name, use:
BigSort SAMPLE.DEL OUT.DAT COMMA 1(14)(Delimited Only)
-or- BigSort SAMPLE.DBF OUT.DBF dBASE 1 (dBASE Only)
2) To sort by state population, use:
BigSort SAMPLE.DEL OUT.DAT COMMA 4 (Delimited Only)
-or- BigSort SAMPLE.DBF OUT.DBF dBASE 4 (For dBASE Only)
Note: By not specifying a length in the delimited field, BigSort(tm)
knows to treat the field as a number.
3) If there were a second file (SAMPLE2.DEL), use:
BigSort SAMPLE*.DEL OUT.DAT COMMA 1(14)
-or- BigSort SAMPLE.DEL+SAMPLE2.DEL OUT.DAT COMMA 1(14)
Note: dBASE files can not be sorted together. You can, however, sort
two dBASE files separately.
EXAMPLES: SWITCHES:
1) If the states are sometimes all in capitals and sometimes starting with
a single capital followed by lower case letters, use:
BigSort SAMPLE.TXT OUT.DAT FIXED 1(14)/I
Note: The /I switch turns off case sensitivity (case Insensitive) and
allows upper and lower case words to be sorted together. Without
the /I switch, they would not be sorted properly. Note, however,
that the /I switch slows down processing by about 25%. Also, never
use the /I switch with binary numbers.
2) To sort by state name from "Z" to "A", use:
BigSort SAMPLE.TXT OUT.DAT FIXED 1(14)/D/I
Note: The /D switch stands for Descending ordering.
3) To sort by state name within descending population (99999 -> 00000),
use:
BigSort SAMPLE.TXT OUT.DAT FIXED 34(8)/D+1(14)/I
4) There are some cases in which you might want to reverse the order of the
characters. This is especially true when sorting binary numbers. As an
example, you might want to sort the state postal code, so that the first
letter counts as the second digit and vice-versa:
BigSort SAMPLE.TXT OUT.DAT FIXED 17(1)+16(1)
-or- BigSort SAMPLE.TXT OUT.DAT FIXED 16(2)/R
SUGGESTIONS AND TECHNICAL NOTES:
A) If the records are known to be of fixed length and that length is known,
use the FIXED or Record Length format. BigSort(tm) is fastest with
fixed length records, performing about 25% better than the default CR/LF
variable record length format.
B) Avoid using the /I switch when it is not necessary. It can slow down
processing by 25%.
C) To maximize its efficiency, have as much RAM available as possible when
running; the less swapped to disk, the better.
D) To further improve efficiency, send the output to a different volume
(another hard disk) if available. This prevents the drive from going
back and forth from the input and output files, which can add up to lots
of time. DO NOT use a floppy disk, as they are VERY slow.
E) Make sure there are sufficient FILES available, as defined in your
CONFIG.SYS. Generally only about 4 FILES are necessary. However, in
the worst case BigSort(tm) requires many more FILES. Make sure your
CONFIG.SYS sets FILES=15 or more.
F) The maximum length all the input files is 2,147,483,647 bytes (2GB).
The maximum length of each record is 65535 bytes (64KB). There can be
up to eight key segments, for a total maximum length of 1024 bytes.
Each segment can be up to 255 bytes long.
G) Don't use the name of the input file for the output file. If you have a
type-o or the power goes out, you could lose both the original and
sorted data!
H) Keep the key as short as possible. For instance, if one part of the key
is a number which is unique to each record, don't include any additional
key segments. Also avoid large sections of white space in the key.
I) Here is a neat trick: If you have a long text file, and you want to sort
groups of records, place an extra blank line (CR/LF) after each group.
Then enter: BigSort <InputFile> <OutputFile> CRLFCRLF <SomeKey>. This
tells BigSort(tm) that the group of records is a single variable-length
record. This works provided no group of records exceeds 64k.
DEFINITION OF SHAREWARE:
Shareware distribution gives users a chance to try software before
buying it. If you try a Shareware program and continue using it, you are
expected to register. Individual programs differ on details -- some
request registration while others require it, some specify a maximum
trial period. With registration, you get anything from the simple right
to continue using the software to an updated program with printed
manual.
Copyright laws apply to both Shareware and commercial software, and the
copyright holder retains all rights, with a few specific exceptions as
stated below. Shareware authors are accomplished programmers, just like
commercial authors, and the programs are of comparable quality. (In both
cases, there are good programs and bad ones!) The main difference is in
the method of distribution. The author specifically grants the right to
copy and distribute the software, either to all and sundry or to a
specific group. For example, some authors require written permission
before a commercial disk vendor may copy their Shareware.
Shareware is a distribution method, not a type of software. You should
find software that suits your needs and pocketbook, whether it's
commercial or Shareware. The Shareware system makes fitting your needs
easier, because you can try before you buy. And because the overhead is
low, prices are low also. Shareware has the ultimate money-back
guarantee -- if you don't use the product, you don't pay for it.
THE ASSOCIATION OF SHAREWARE PROFESSIONALS:
BigSort(tm) is produced by a member of the Association of Shareware
Professionals (ASP). ASP wants to make sure that the shareware principle
works for you. If you are unable to resolve a shareware-related problem
with an ASP member by contacting the member directly, ASP may be able to
help. The ASP Ombudsman can help you resolve a dispute or problem with
an ASP member, but does not provide technical support for members'
products. Please write to the ASP ombudsman at 545 Grover Road,
Muskegon, MI 49442-9427 USA, FAX 616-788-2765 or send a CompuServe
message via CompuServe Mail to ASP Ombudsman 70007,3536.
_______
____|__ | (R)
--| | |-------------------
| ____|__ | Association of
| | |_| Shareware
|__| o | Professionals
-----| | |---------------------
|___|___| MEMBER
DISCLAIMER - AGREEMENT:
Users of BigSort(tm) must accept this disclaimer of warranty:
"BigSort(tm) is supplied as is. The author disclaims all warranties,
expressed or implied, including, without limitation, the warranties of
merchantability and of fitness for any purpose. The author assumes no
liability for damages, direct or consequential, which may result from
the use of BigSort(tm)."
BigSort(tm) is a "shareware program" and is provided at no charge to the
user for evaluation. Feel free to share it with your friends, but
please do not give it away altered or as part of another system. The
essence of "user-supported" software is to provide personal computer
users with quality software without high prices, and yet to provide
incentive for programmers to continue to develop new products. If you
find this program useful and find that you are using BigSort(tm) and
continue to use BigSort(tm) after a reasonable trial period, you must
make a registration payment of $20 to MeadowBrook Industries, Ltd. The
$20 registration fee will license one copy for use on any one computer
at any one time. You must treat this software just like a book. An
example is that this software may be used by any number of people and
may be freely moved from one computer location to another, so long as
there is no possibility of it being used at one location while it's
being used at another. Just as a book cannot be read by two different
persons at the same time.
Commercial users of BigSort(tm) must register and pay for their copies
of BigSort(tm) within 30 days of first use or their license is
withdrawn. Site-License arrangements may be made by contacting
MeadowBrook Industries, Ltd.
Anyone distributing BigSort(tm) for any kind of remuneration must first
contact MeadowBrook Industries, Ltd at the address below for
authorization. This authorization will be automatically granted to
distributors recognized by the (ASP) as adhering to its guidelines for
shareware distributors, and such distributors may begin offering
BigSort(tm) immediately (However MeadowBrook Industries, Ltd must still
be advised so that the distributor can be kept up-to-date with the
latest version of BigSort(tm)).
You are encouraged to pass a copy of BigSort(tm) along to your friends
for evaluation. Please encourage them to register their copy if they
find that they can use it. All registered users will receive a copy of
the latest version of the BigSort(tm) system and printed documentation.
SUPPORT:
If you have any questions or problems concerning the use of BigSort(tm),
please write to the address below. Alternately, if you are a user of
email, write to the email address listed below for a faster response.
There are no duration restrictions placed on BigSort's support services.
When requesting support, be sure to include the version number, as well
as a complete description of the problem.
REGISTRATION:
Once you have determined BigSort(tm) is a tool you intend to use, please
register your copy:
SINGLE MACHINE LICENSE: $ 20.00
MULTIPLE SINGLE USER MACHINE LICENSES: $ 15.00 per machine
(2+ computers)
NETWORKED MACHINE LICENSES / SINGLE SERVER: $ 10.00 per node
(4+ computers)
After receiving your registration fee, MeadowBrook will send your
registered copies of the latest version of BigSort(tm) with printed
documentation, and upgrade information as it becomes available. Be sure
to include your address and to specify either 3 1/2 or 5 1/4 inch disks.
If you prefer product information to arrive via email rather than
conventional mail, please include your email address. Please see
REGISTER.DOC for more information.
CONTACTING MEADOWBROOK INDUSTRIES:
If you have any questions, suggestions, or wish to register BigSort(tm),
write to:
BigSort(tm)
MeadowBrook Industries, Ltd.
450 Veterans Drive
Burlington, NJ 08016
Or send email to: BigSort@Poor.Pgh.PA.US
BIGSORT-PLUS:
BigSortPlus(tm) is now available for $ 49.00 on a commercial basis. It
contains all the functionality of BigSort(tm) in an interactive
environment:
* Interactive version, with pull-down menus
* On-line, context-sensitive help
* Comprehensive manual included
If you would prefer to purchase BigSortPlus(tm) instead of registering
BigSort(tm), either purchase from your local software store or from the
address above. Please specify either 3 1/2 or 5 1/4 inch disks.
SINGLE MACHINE COPY OF BigSortPlus(tm): $ 49.00
Discounts are available for networked systems and multiple single user
systems. Please contact MeadowBrook Industries for more information.