July 1997 Programmer's Challenge
Disambiguator
Mail solutions to:
progchallenge@mactech.com
Due Date: MIDNIGHT, EDT, 1 July 1997
The Challenge this month is to write a string completion routine
loosely patterned after the keyword lookup facility in the QuickView
utility. QuickView will suggest a completion of the keyword as you
begin to type it, and update that suggested completion as you
continue to type. In the Toolbox Assistant, for example, if you are
looking for documentation on InitGraf and type "i", the suggested
completion is "iconIDToRgn". As you continue by typing "n", the
suggestion becomes "index2Color". Adding "i" yields "initAllPacks";
adding "t" leaves the suggestion intact; adding "g" changes it to
"initGDevice". Finally, typing "r" gives the desired "initgraf".
For our disambiguator, you will be given an unsorted list of words
and an opportunity to preprocess them. Then you will be given a
string to match and asked to return a list of words matching
findString. To make the problem more interesting, the match string
can contain wild card characters, as described below. The prototype
for the code you should write is:
typedef unsigned long ulong;
void InitDisambiguator(
const char *const wordList[], /* words to match against */
ulong numWords, /* number of words in wordList */
void *privStorage, /* private storage preinitialized to zero */
ulong storageSize /* number of bytes of privStorage */
);
ulong /*numMatch*/ Disambiguator(
const char *const wordList[], /* words to match against */
ulong numWords, /* number of words in wordList */
void *privStorage, /* private storage */
ulong storageSize, /* number of bytes of privStorage */
char *findString, /* string to match, includes wild cards */
const char *matchList[] /* return matched words here */
);
Your InitDisambiguator routine will be called with an unsorted
list wordList of numWords null-terminated words to match. The
wordList words will include alphanumeric characters, spaces, and
underscores. You will also be provided with a pointer privStorage to
storageSize bytes of preallocated memory initialized to zero. The
amount of storage provided will be at least 20 bytes for each word in
wordList, plus one byte for each character in the wordList (including
the null byte, and rounded up to a multiple of 4). In other words,
storageSize will be no smaller than minStorage, calculated as:
for (minStorage=0,i=0; i<numWords; i++)
minStorage += 20 + 4*(1+strlen(wordList[i])/4);
InitDisambiguator is not allowed to modify the wordList, but you
may store a sorted version of wordList, or pointers to the words in
sorted order, in privStorage. The first four parameters provided to
Disambiguator will be identical as those provided to
InitDisambiguator. In addition, you will be provided with the
null-terminated findString and a preallocated array matchList with
numWords entries where you are to store pointers to the words that
match findString. Your string matches should be case insensitive
(i.e., "initgr" matches "InitGraf". The matchList should be returned
with the strings ordered in case-insensitive ASCII order (i.e., space
< [0..9] < [A-Za-z] < underscore).
The findString may also contain zero or more of the wildcard
characters '?', '*', and '+'. The wildcard '?' matches any single
character, '*' matches zero or more characters, and '+' matches one
or more characters. So, for example, "*graf" matches any string
ending in the (case-insensitive) string "graf", while "+1Ind+"
matches any string containing "1Ind" between the first and last
characters of a word.
For each call to InitDisambiguator, your Disambiguator routine
will be called an average of 100 to 1000 times. The winner will be
the solution that finds the correct matchList in the minimum amount
of time, including the time taken by the initialization routine.
This will be a native PowerPC Challenge, using the latest
CodeWarrior environment. Solutions may be coded in C, C++, or Pascal.
The problem is based on a suggestion by Charles Kefauver, who pointed
me to an April, 1995, AppleDirections article discussing the user
interface for a disambiguator. Charles wins 2 Challenge points for
his suggestion.
Post-Publication clarifications
Q1: You have defined the character set [space,0-9,A-Z,a-z,_] for
the words in wordList. Can I assume that findString will only contain
characters from the same set, in addition to [?,+,*,]? No additional
punctuation? A1: Correct. No additional punctuation.
Q2: Will wordList contain duplicates? But if there are, should all
duplicates be reported?
A2: There will be no duplicates.
Q3: Seeing that, with spaces, a "word" in wordList may be a whole
sentence, can you give an upper limit for strlen(wordList[i]), say
255?
A3: OK, the words will be no longer than 255 characters. It is not
my intent to include whose sentences in the word list.
Q4: The parameter:
const char *matchList[] /* return matched words here
*/
Should have the const as included, otherwise you can't return the
pointers from wordList.
A4: Correct. The declaration should include const as listed in
this question.
Q5: We're allowed to use small amounts of const static storage
tables right? (like a table of 256 characters or longs, that kind of
thing)
A5: OK.
|