June 1996 Programmer's Challenge
Postman's Sort
Mail solutions to:
progchallenge@mactech.com
Due Date: 1 June 1996
NOTE EARLIER DUE DATE
Everyone knows that sorting algorithms require execution time that
grows faster than linearly with the number of input records.
Specifically, sorting N input records is known to require somewhere
between O(N*log2(N)) and O(N**2) comparisons, depending on the
specific algorithm and whether one is considering average or
worst-case time, right? Wrong. Your Challenge is to code a sorting
algorithm that requires time proportional to the number of input
records.
The bounds in the previous paragraph apply to sorting algorithms
that are based on pairwise comparison of input keys. A distribution
sort, however, requires no key comparisons, and has different
performance characteristics. Imagine, for example, how the post
office might sort mail. Initially, the mail might be distributed into
piles by country. Each of those piles might be distributed into other
piles by state. Those piles might be distributed by city, and then by
street, and finally by address. All of this could be accomplished by
making a sequence of passes through the input without ever comparing
the sort keys of two input records to one another. This month, you
are to implement a distribution (or "postman's") sort algorithm.
The prototype for the code you should write is:
#include <stdio.h>pascal OSErr PostmansSort(
FILE *inFile, /* stream containing sort input */
FILE *outFile, /* stream to contain sort output */
void *privateStorage, /* preallocated storage for your use */
size_t storageSize /* number of bytes of privateStorage */
);
The input will consist of records in the following format:
[FirstName] [MiddleName] LastName
StreetAddress StreetName City State [ZipCode]
[Country]
Input records should be read from inFile, sorted, and
written to outFile. Both the input and output files will be
opened and closed for you by the calling routine. Records will
consist of up to 9 fields, as indicated above, and be terminated by a
carriage return (0x0D). Fields will be separated by tabs (0x09). The
square brackets in the format description are meta-characters
indicating optional fields; the brackets will not be present in the
input. Fields may contain embedded spaces or special characters
(except tabs and returns).
Records are to be sorted into ascending order with
Country as the primary sort key, ZipCode (if
present) as the secondary key, then StreetName,
StreetAddress, LastName, FirstName, and
finally MiddleName, in that order. If ZipCode is
not present, then State and City should replace it
as the secondary and tertiary sort keys, respectively; otherwise
State and City should be ignored. Sort order should
be lexicographic except when a field value is purely numeric, and a
purely numeric field should be treated as smaller than a field
containing non-numeric characters. That is, field values
"1", "2", "10", "1B",
"1C", and "2A" would sort in the order indicated.
Empty optional fields should be considered to have the smallest
possible value (e.g., records with no Country should be
output before records with a Country value). Your routine
should return zero (noErr) if it completes normally, or a non-zero
error code if it is unable to complete the sort for any reason.
There are no predetermined limits on the number of distinct
Country, State, City, etc. values that
might be present in the input. Your routine will be provided with
storageSize bytes (at least 64KB) of preallocated storage,
initialized to zero by the calling routine, and pointed to by
privateStorage. You may not allocate any additional memory,
although use of small amounts of static memory is permitted. If your
solution requires additional storage, you should create and write to
temporary disk files. Adequate disk space is guaranteed. In
approximately half of the test cases, you will be guaranteed at least
32 bytes of memory per record, plus 32 bytes for each unique field
value. Scoring will weight these high-memory cases equally with the
cases where less memory is provided.
This will be a native PowerPC Challenge, scored using the
Metrowerks environment. Solutions may be coded in C, C++, or Pascal.
If you use a language other than C, you should ensure that your code
can be called from a test driver written in C. Most importantly, to
be considered correct, your code must sort the input into the proper
order without performing any key comparisons between input records.
The fastest correct solution will be the winner.
This Challenge is based on a suggestion from Peter Shank, passed
on to me by Mike Scanlin. Peter forwarded a reference to an article
on the subject by
Robert
Ramsey, which you can find at
http://www.silcom.com/~ramey/article.html.
Back to the Programmer's Challenge Page
Last modified by Bob Boonstra on 5/12/96.
|