December 1998 Programmer's Challenge
Word Neighbors
Mail solutions to:
progchallenge@mactech.com
EXTENDED Due Date: 11:59pm ET, Saturday, 5 December 1998
Several of you have written me to point out that the Challenges have been getting more and more difficult over the years. Thinking back over the past few months, where we have solved an orbital mechanics problem, programmed a winning Hearts solution, and written an emulator for the first stored-program computer, I can see where this might be true. This month the problem is easier. Your Challenge is to write a search routine that will find a set of words that are near to one another in some target text. You might imagine the solution being used as part of an internet search engine, providing a capability more selective than looking for all of the target words being present on a page, but less restrictive than requiring the target words to form a phrase.
The prototype for the code you should write is:
#if defined(__cplusplus)
extern "C" {
#endif
pascal void Initialize(
char *text, /* NULL terminated text to be searched */
long distance, /* max distance between nearby words */
void *privateStorage, /* private storage for your use */
long storageBytes /* number of bytes in privateStorage */
);
pascal long FindNearby( /* return number of matches found */
char *words[], /* words to find in text */
long numWords, /* number of words */
Boolean caseSensitive, /* true if match is case sensitive */
Boolean preserveOrder, /* true if words must be found in order */
long matchPositions[], /* position in text of first word in match */
long maxMatches /* max number of matches to return */
);
#if defined(__cplusplus)
}
#endif
Your Initialize routine is called to give you an opportunity to preprocess the text to be searched. The text consists of a sequence of words separated by delimiters. For the purpose of this Challenge, a word is a sequence of alphanumeric characters separated by one or more non-alphanumeric characters. The text may include ASCII characters between 0x20 and 0x7E, inclusive, plus carriage returns (0x0D), linefeeds (0x0A), and tabs (0x09). All non-alphanumeric characters are delimiters.
You are given the maximum distance to be allowed between adjacent search words. A distance of 0 corresponds to adjacent words, a distance of 1 corresponds to search words with one intervening word, etc. You are also given a privateStorage pointer to storage available for your use, along with the size (storageBytes) of privateStorage in bytes. There will be at least 16 bytes of storage for each unique (case-sensitive) word in the text, plus 4 bytes for each word occurrence.
For each time that Initialize is called with new text to be searched, your FindNearby routine will be called an average of 10 to 50 times with a set of numWords words to find. You will be told whether the search is to be caseSensitive or not, and whether the words must be found in order (preserveOrder==TRUE) in the text. You must find the first maxMatches occurrences of the words in the text, where a match occurs when each of the words is separated by no more than a maximum number (distance) of intervening words from another of the search words. For each match, you should return in matchPositions the offset in text of the first of the search words found in text. For example, if distance==2 and the text is "The quick brown fox jumps over the lazy dog.", you would return 4 if the words were "brown", "quick", and "jumps", provided preserveOrder is FALSE. No word in the text may be part of more than one match.
The winner will be the solution that correctly finds the word matchPositions in the minimum execution time, including the time taken by the Initialize routine.
This will be a native PowerPC Challenge, using the latest CodeWarrior environment. Solutions may be coded in C, C++, or Pascal.
You can get a head start on the Challenge by reading the
Programmer's Challenge mailing list. It will be posted to the list on
or before the 12th of the preceding month. To join, send an email to
listserv@listmail.xplain.com
with the subject "subscribe challenge-A". You can also join the
discussion list by sending a message with the subject "subscribe
challenge-D".
|