December 1995 Programmer's Challenge
Find Again and Again
Mail solutions to:
progchallenge@mactech.com
Due Date: 10 December 1995
This month the Challenge is to write a text search engine that is
optimized to operate repeatedly on the same text. You will be given a
block of text, some storage for data structures, and an opportunity
to analyze the text before being asked to perform any searches
against that text. Then you will repeatedly be asked to find a
specific occurrence of a given word in that block of text. The
prototypes for the code you should write are:
void InitFind(
char *textToSearch, /* find words in this block of text */
long textLength, /* number of chars in textToSearch */
void *privateStorage, /* storage for your use */
long storageSize /* number of bytes in privateStorage */
);
long FindWordOccurrence(
/* return offset of wordToFind in textToSearch */
char *wordToFind, /* find this word in textToSearch */
long wordLength, /* number of chars in wordToFind */
long occurrenceToFind, /* find this instance of wordToFind */
char *textToSearch, /* same parameter passed to InitFind */
long textLength, /* same parameter passed to InitFind */
void *privateStorage, /* same parameter passed to InitFind */
long storageSize /* same parameter passed to InitFind */
);
The InitFind routine will be called once for a given
block of textLength characters at textToSearch to
allow you to analyze the text, create data structures, and store them
in privateStorage. When InitFind is called,
storageSize bytes of memory at privateStorage will
have been preallocated and initialized to zero.
FindWordOccurrence is to search for words, where a word
is defined as a continuous sequence of alphanumeric characters
delimited by a non-alphanumeric character (e.g., space, tab,
punctuation, hyphen, CR, NL, or other special character). Your code
should look for complete words &endash;it would be incorrect, for
example, to return a value pointing to the word "these" if the
wordToFind was "the". The wordToFind will be a
legal word (i.e., no embedded delimiters).
FindWordOccurrence should return the offset in
textToSearch of the occurrenceToFind-th instance of
wordToFind. It should return -1 if wordToFind does
not occur in textToSearch, or if there are fewer than
occurrenceToFind instances of wordToFind.
Both the InitFind and the FindWordOccurrence
routines will be timed in determining the winner. In designing your
code, you should assume that FindWordOccurrence will be
called approximately 1000 times for each call to InitFind
(with the same textToSearch, but possibly differing values
of wordToFind and occurrenceToFind).
There is no predefined limit on textLength - you should
handle text of arbitrary length. The amount of
privateStorage available could be very large, but is
guaranteed to be at least 64K bytes. While the test cases will
include at least one large textToSearch with a small
storageSize, most test cases will provide at least 32 bytes
for each occurrence of a word in textToSearch, so you might
want to optimize for that condition.
Other fine print: you may not change the input pointed to by
textToSearch or wordToFind, and you should not use
any static storage other than that provided in
privateStorage.
This will be a native PowerPC Challenge, scored using the latest
CodeWarrior compiler. Good luck, and happy searching.
If you have any questions, please send me e-mail at one of the
Programmer's Challenge
addresses, or directly to
boonstra@ultranet.com.
Back to the Programmer's Challenge Page
|