home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Monster Media 1993 #2
/
Image.iso
/
os2
/
bigsrt42.zip
/
BIGSORT.TXT
< prev
Wrap
Text File
|
1993-06-10
|
12KB
|
290 lines
BIGSORT V4.2 : A Fast in-memory sort for files of any size.
----------------------------------------------------
User Supported Program
Continued use requires a donation of $20
(C)1988-93 Turgut Kalfaoglu <turgut@frors12.bitnet>,<turgut@frmop11.bitnet>
BIGSORT uses the fastest known sorting algorithm to sort files that can
be as large as your swapping area (not RAM) allows. A wide range of
options along with multiple key fields enable you to pinpoint the
desired sorting method.
BIGSORT is especially well suited for batch files, and to be called from
other programs. It returns specific error codes, and never prompts for
verification or additional information. Always using the defined
(primary) collating sequence for your country, BIGSORT will be able to
place your national characters in the correct order.
Unless /Verbose option is specified, BIGSORT never writes its messages
to the standard output, to prevent its messages from getting written to
an output file. Its messages are always directed to "standard error."
Under OS/2, it is possible to redirect stderr to a file, if desired.
This program is shareware: A registration allows you stay up-to-date on
enhancements to the product, and enables you to purchase the source
code.
Usage:
BIGSORT [options] < inputfile > outputfile
if you omit the '< inputfile' part, BIGSORT will wait for an answer from
the keyboard. If that is what you wish, enter the data, separating each
one by a RETURN character, then enter CTRL-Z to finish the entry.
if you omit '> outputfile' part, BIGSORT will send its output to the
screen.
For some online help, type
BIGSORT HELP
The normal usage of BIGSORT is either thru OS/2 "pipes", or thru
redirection. Pipes enable a program's output to be sent as input to a
second program. This is specified by using the "|" symbol between the
two programs. Redirection is similar, it allows the output of a command
to be sent to a file, instead of getting displayed on the screen. The
">" symbol indicates that the output should be sent to the file, not to
the screen. Note that the > symbol causes the previous contents of the
file to be lost. The >> symbol can be used to append to the previous
contents.
Options
-------
Use options to change the default behavior of BIGSORT, which is:
* Start sorting on the first position of each line,
* Do a case-sensitive alphanumeric sort,
* Reserve room for 100,000 lines of input file. (Lines, not
bytes).
If you wish to use multiple options, you need to separate them by
spaces. The options available with this version are:
/+nnn where nnn's are a number, will cause BIGSORT to start sorting
items from that column. If omitted, BIGSORT sorts the file
starting from the first character.
/+nnn-mmm where nnn and mmm are the column numbers, causes the program
to focus only on the area between those two columns. This
option can be repeated as many times as necessary to specify
secondary sort keys. See the chapter on multiple keys.
/R Reverses the sort order. The sorting order will be descending
order for that field, if this option is specified.
/Ds Specifies the symbol to use as a delimiter for the date
symbol. 's' can be either a dash "-", a slash "/", a period
"." or nothing. If "/D " is specified, BIGSORT assumes that
the digits are attached to each other, like 19921220, to
specify a date of Dec 20th, 1992. Default: /D-
/I Ignore case. Without this option, A comes before a, and Z comes
before a. Use this option to prevent this.
/MMDDYY The field is a date field, in the format of MM-DD-YY. Unless
the /D option is specified, BIGSORT assumes that dashes
separate the digits.
/DDMMYY Similar - the data field is in DD-MM-YY format.
/YYMMDD Similar - the data field is in YY-MM-DD format.
/Snnnn Specifies the index size for the file. One index entry per
each line in your file is needed to load the file. Normally
BIGSORT reserves a room for 100 thousand lines. If the number
of lines (or records) in your file is more than 100 thousand,
you need to specify the /S option. For example, if your file
is 200 thousand lines, and you expect it to grow, you may tell
bigsort to reserve room for 500 thousand: /S500000
/N Indicates that the field specified is a numeric field, thus a
numeric comparison should take place. This prevents leading
blanks and other characters from interfering with the sorting
order of numbers. If /N is specified, BIGSORT compares the
field contents after converting them to floating-point
numbers. This ensures accurate sorting of numbers with decimal
places.
Specifying Ranges
-----------------
Ranges limit the area where BIGSORT should focus on. For example, if you
wanted to sort the files displayed with OS/2 DIR command, based on the
file sizes, you could tell BIGSORT to sort the data based on the
information contained between column 17 and 27. The option for that
would look like: "/+17-27".
Options are parsed from left to right, changing the internal defaults as
it goes along. When BIGSORT comes upon a range field, it records the
current setting of such options as "/R", "/I", "/N" and date-related
options, for that range. It then resets these options to the defaults,
to enable you to construct a completely new set of rules of your second
sort field specification, if any.
Let's see this with an example. Let's try to sort DIR's output on
creation date, and then on the name. Our idea is to display the files in
the chronological order, but files created on the same day, should be
sorted by name within themselves.
Here is a ruler line, followed by a sample output of the DIR command:
0 1 2 3 4 5
12345678901234567890123456789012345678901234567890
9-11-92 8:01p <DIR> 922 .
9-11-92 8:01p <DIR> 690 ..
12-22-92 7:14p 5638 0 BIGSORT.BAK
12-22-92 7:42p 5267 0 BIGSORT.C
12-22-92 7:14p 869 0 BIGSORT.H
12-22-92 8:44p 5167 0 BIGSORT.TXT
12-22-92 7:12p 3004 0 COMPARES.C
(...)
The command to give to BIGSORT to sort the above list would be:
DIR | BIGSORT /MMDDYY /d- /+1-9 /i /+41-80 > myfile.output
Let's analyze this command. When OS/2 "sees" this line, it first erases
and opens the "myfile.output" which will store the results of the
operation. It then runs the "DIR" command, passing its output to BIGSORT
with all the parameters.
When BIGSORT starts, it defaults to using its case-sensitive
alphanumeric sort. The first option changes this to the sort on the date
field. The second option specifies that the month, days and year digits
are separated by dashes, which is the default, by the way. When we
specify the range, "/+1-9" our /MMDDYY option is recorded as the desired
sort method for the first range, and BIGSORT resets the sort method to
case-sensitive alphanumeric sort. Now it reads the "/I" option and
switches its current sort method to case-insensitive, alphanumeric sort.
When it reads the "/+41-80" range specifier, it records that our second
field selection should be sorted using an alphanumeric, but
case-insensitive sort.
When BIGSORT is done, it sends the result to "standard output" which has
been redirected to a file with the "> myfile.output" part of the
command.
Each time a range is specified, the specified options are recorded for
that range, and the options are reset. Thus, you may have to specify the
same option several times in the command line, in some cases. For
example, to sort on the Lastname, then on the firstname, both using
case-insensitive sort, you need to put something like:
TYPE myfile | BIGSORT /i /+10-25 /i /+27-40
Multiple Ranges
---------------
BIGSORT accepts up to twenty ranges. This means that you can specify up
to twenty different "zones" in your data for BIGSORT to sort on.
Multiple key fields are useful if you wish for example that your
database output be sorted on the date field, and within that, on the
last name of the person. You can tell BIGSORT to sort the first field
definition corresponds to a date, and the second one to a name. This
way, BIGSORT will continue sorting records on the name, if the dates are
identical.
BIGSORT and multiple files:
---------------------------
If you wish to sort and merge several files into one, you can do it with
one command under OS/2. Just do a:
TYPE *.* | BIGSORT > result.txt
Note that the TYPE command also sends the filename of the file it is
processing, to "standard output". You may have to manually remove such
records.
BIGSORT Swap Area:
------------------
BIGSORT creates no swap area of its own, but uses OS/2 to allocate
necessary memory to load the entire file, along with an index pointer
for each line. You can "guesstimate" that an input file of 8MB, will
occupy a little over 8MB of RAM when BIGSORT is working on that file.
OS/2 will first allocate all available RAM, then provide the rest from
its swap area. Since this area grows and shrinks automatically, there is
nothing wrong with having a temporarily large swap file - just make sure
to have enough space on the disk where the swap file resides.
Shareware
---------
BIGSORT represents countless hours of work. Please contribute to the
shareware discipline by sending $20 of registration fee to:
Turgut Kalfaoglu
1378 Sok. 8/10
Izmir 35210
Turkey
Source Code
-----------
BIGSORT is compiled under IBM C/SET 2, at CSD level 28. Clear and
well-documented source code with documentation is available ONLY to the
registered users. Send $10 (to cover shipping charges) and a blank disk
(either 3.5 or 5.25) to the above address to receive the source code.
The author encourages you to register, but also to ask for additional
features, or comments. Please don't think that you need to register this
software for additional features or to report problems or suggestions.
However, regular use requires a donation of $20 to the above address.
Update History
--------------
Version 4.2:
------------
Fixes problem with sorting based on date specifications.
Adds more timing information into the "/V" (verbose) option.
Version 4.1:
------------
Implements the "/N" (numeric field) option.
Version 4:
----------
Implements multiple key fields,
Implements sort ranges,
Implements the /D option.
Implements country-specific information
Version 4 is almost a complete re-write with division of source code
into five segments.
New Version: V3
---------------
Implements unlimited input filesize.
New Version: (V2.1 to V2.2)
---------------------------
Added features: It can now handle dates as well!
Now /R and /I can be used at the same time.
Improved performance, but code size still about
the same (you should see the tricks that were done
to keep it that way :)
Bugs Removed: None were found in V2.1