|
Volume Number: | 9 | |||
Issue Number: | 3 | |||
Column Tag: | Programmers’ Challenge |
Programmers’ Challenge
By Mike Scanlin, MacTech Magazine Regular Contributing Author
Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.
Count Unique Words
Most word processors these days have a Count Words command. The quality in terms of accuracy and speed of these commands varies quite a bit. I tested three leading word processors with a document containing 124,829 characters and got three different answers ranging from 18446 words to 18886 words and times ranging from 4 seconds to 11 seconds. I’m not sure what the correct answer was for that document; it depends on how you define what a word is.
For purposes of this month’s challenge, a word is defined as an unbroken set of one or more letters. The input text will only contain upper and lower case letters a to z, spaces, carriage returns, periods and commas (for a total of 56 possible byte values). No digits, hyphens, tabs, other punctuation, etc. Since counting words using this simplified definition is rather trivial, you’re going to count the number of unique words instead.
The prototype of the function you write is:
unsigned short CountUniqueWords(textPtr, byteCount) PtrtextPtr; unsigned short byteCount;
Your function should return the number of unique words (case insensitive) in the input text. The maximum word length for individual words in the input text is 255 characters.
This is my 7th programmer’s challenge that I’ve posed to MacTech readers. I have received approximately very little feedback as to what you think of these challenges. Are they too easy, too hard, too uninteresting, or what? Do you want hard core numerical analysis puzzles (like write a fast sqrt function) or do you want Mac-specific problems (like write a fast TileAndStackWindows function) or are things okay as they are? If you have any ideas for future challenges, please send them in (credit will be given in this column if I use one of your ideas). Thanks.
Two Months Ago Winner
The winner of the “Travelling Salesman” challenge is Ronald Nepsund (Northridge, CA) whose solution was the only one of the five I received which gave correct results. The time intensive part of solutions to this class of problems is the distance between two points calculation, which involves a square root. Ronald uses a precomputed sqrt table for values 0 to 25 to eliminate much of this time.
A couple of people chose the algorithm of “find the closest city to where we currently are and move to that city; repeat until all cities have been visited” which is not correct. An example set of input data that broke everyone but Ronald’s solution is: numCities = 8, startCityIndex = 5, *citiesPtr = {1,1}, {2,1}, {3,1}, {2,2}, {1,3}, {2,3}, {3,3}, {2,4}. If you draw it and work it out by hand (through trial and error) you can see that the minimum path distance is 8.66. There is more than one correct ordering for the optimal path but all of the optimal paths will have that same length.
Here is Ronald’s winning solution to the January Challenge:
//*********************************** // Travelling Salesman // by Ronald M. Nepsund #include <math.h> #define fracBase 0x20000000 //There are two 32 by 20 arrays of longs //which together give the distance betwean //any two cities. //Instead of using Array[i,j] to access //the array Array[(i<<5)+j] is used //and two longs are needed to accurately //measure the distance betwean cities //so two arrays of longs are used. //gDistanceFrac is used to hold the //fractional part of distance in 1/0x20000000 //of a unit. long gDistanceInt[640], gDistanceFrac[640]; //these are used to represent a path betwean //the cities. Byte gNextCity[20],gOptPath[20]; //how long is the currently selected best //path so far. long gBestPathLength,gFracBestPathLength; unsignedshort gNumCities; unsignedshort gStartCityIndex; //precalculated square root for zero to 25 long qSquTableInt[] = {0,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3, 4,4,4,4,4,4,4,4,4, 5,5,5,5,5,5,5,5,5,5,5}; //precalculated square root - fractional part //in 1/0x20000000 of a whole unit long qSquTableFrac[]= {0,0,222379212,393016784,0,126738030, 241317968,346685095,444758425,0, 87122155,169986639,249162657, 325102865,398174277,468679365,0, 66091829,130266726,192682403, 253476060,312767944,370664138, 427258795,482635936,0 }; void DoPath(short cityIndex, long InttPathLength, long fracPathLength); //The recursive routien that actually finds //the shortest path. void DoPath(registershort cityIndex, long InttPathLength, long fracPathLength) { register short i; BooleanlastCity; long offset; if (fracPathLength > fracBase) { //the fractional value variable has //exceeded the value of one whole //unit InttPathLength += 1; fracPathLength -= fracBase; } //Has the path has become longer than the //shortest path we have already found? if (InttPathLength > gBestPathLength || ((InttPathLength == gBestPathLength) && (fracPathLength >= gFracBestPathLength))) return; //lastCity is used to tell if all the //cities have been visited lastCity = TRUE; //for each city for(i = 0; i<gNumCities; i++) //if not to same city or already //visited city if ( i != cityIndex && gNextCity[i] == 0xFF) { //not at the end of the path lastCity = FALSE; //path from city ‘cityIndex’ to ‘i’ gNextCity[cityIndex] = i; //offset into distance arrays offset = (cityIndex << 5) + i; //go to next city adding the //distance to that city to the //path length DoPath(i, InttPathLength+gDistanceInt[offset], fracPathLength+gDistanceFrac[offset]); } //end if and for // if this is the last city in the chain and // is a shorter path than the previous best if ((lastCity) && ((InttPathLength < gBestPathLength) || ((InttPathLength == gBestPathLength) && (fracPathLength < gFracBestPathLength)) ) ) { // make this the current best path register long *LPnt1,*LPnt2; //this is the current shortest path //length now gBestPathLength = InttPathLength; gFracBestPathLength = fracPathLength; //copy path to ‘optPath’ LPnt1 = (long *)&gNextCity; LPnt2 = (long *)&gOptPath; for (i= ((3+gNumCities) >> 2); i>0; i-) *LPnt2++ = *LPnt1++; } else //this city is no longer connected to //the next city gNextCity[cityIndex] = 0xFF; } void InitDistances(unsigned short numCities Point *citiesPtr); //initialize two arrays which will give the //distance betwean any two cities. void InitDistances( unsigned short numCities, Point *citiesPtr) { short i,j,offset; register long *LPntl1,*LPntF1, *LPntI2,*LPntF2; long dist; short deltax,deltay; double X; //The distance from city i to j is the same //as from city j to i. //Use pointers into the arrays //We will add a constant to the pointers to //step through the array //instead of doing a multiplication to find //the wanted entries in the array //how far is it betwean any two cities for (i=0; i<numCities; i++){ LPntl1 = gDistanceInt + i; LPntF1 = gDistanceFrac + i; offset = i << 5; LPntI2 = gDistanceInt + offset; LPntF2 = gDistanceFrac + offset; for (j=0; j<=i; j++) if (i==j){ //both pointers are pointing //to the same locations in the array //distance to the same city is zero *LPntI2++ = 0; *LPntl1 = 0; LPntl1 += 32; *LPntF2++; LPntF1 += 32; } else { //calculate horizontal and vertical //distance betwean city ‘i’ and ‘j’ deltax = citiesPtr[i].h- citiesPtr[j].h; deltay = citiesPtr[i].v- citiesPtr[j].v; //The distance betwean the cities is // squareRoot( deltax*deltax + // deltay*deltay) //Where you can, do multiplications //using shorts instead of long’s - //They are faster. if (-255< deltax && deltax<256) if (-255< deltay && deltay<256) dist = ((long)(deltax*deltax) + (long)(deltay*deltay)); else dist = ((long)(deltax*deltax) + (long)deltay*deltay); else if (-255< deltay && deltay<256) dist = ((long)deltax*deltax + (long)(deltay*deltay)); else dist = ((long)deltax*deltax + (long)deltay*deltay); //do squareRoot if (dist <= 25) { //use sqrt lookup tables for //0 to 25 *LPntI2++ = *LPntl1 = qSquTableInt[dist]; LPntl1 += 32; *LPntF2++ = *LPntF1 = qSquTableFrac[dist]; LPntF1 += 32; } else { X = sqrt(dist); //gDistanceInt[(i<<5) + j] = X; //gDistanceInt[(j<<5) + i] = X; //integer part of distance // between points dist = X; *LPntl1 = *LPntI2++ = dist; LPntl1 += 32; //gDistanceFrac[i<<5 + j] = // (X - dist) * $20000000; //gDistanceFrac[j<<5 + i] = // (X - dist) * $20000000; // fractional part dist = (X - dist) * fracBase; *LPntF2++ = *LPntF1 = dist; LPntF1 += 32; } } } } void OptimalPath(unsigned short numCities unsigned short startCityIndex, Point *citiesPtr,Point *optimalPathPtr); void OptimalPath(numCities,startCityIndex,citiesPtr, optimalPathPtr) unsigned short numCities; unsigned short startCityIndex; Point *citiesPtr; Point *optimalPathPtr; { register short i,j; long time,index; double X; //generates the tables for the distances //betwean any two cities. //This routien takes up most of the time. InitDistances(numCities,citiesPtr); //OxFF means that there is no path from //this city to another for (i=0; i<numCities; i++) //no paths betwean cities gNextCity[i] = 0xFF; gNumCities = numCities; gStartCityIndex = startCityIndex; //any path done by DoPath will be shorter //than this gBestPathLength = 0x7FFFFFFF; gFracBestPathLength = 0; //find the best path DoPath(startCityIndex,0,0); //put the best path into the form //desired for ‘optimalPath’ j=startCityIndex; for(i=0; i<numCities; i++) { optimalPathPtr[i] = citiesPtr[j]; j = gOptPath[j]; } }
Rules
Here’s how it works: Each month there will be a different programming challenge presented here. First, you must write some code that solves the challenge. Second, you must optimize your code (a lot). Then, submit your solution to MacTech Magazine (formerly MacTutor). A winner will be chosen based on code correctness, speed, size and elegance (in that order of importance) as well as the postmark of the answer. In the event of multiple equally desirable solutions, one winner will be chosen at random (with honorable mention, but no prize, given to the runners up). The prize for the best solution each month is $50 and a limited edition “The Winner! MacTech Magazine Programming Challenge” T-shirt (not to be found in stores).
In order to make fair comparisons between solutions, all solutions must be in ANSI compatible C. All entries will be tested with the FPU and 68020 flags turned off in THINK C. When timing routines, the latest version of THINK C will be used (with ANSI Settings plus “Honor ‘register’ first” and “Use Global Optimizer” turned on) so beware if you optimize for a different C compiler.
The solution and winners for this month’s Programmers’ Challenge will be published in the issue two months later. All submissions must be received by the 10th day of the month printed on the front of this issue.
All solutions should be marked “Attn: Programmers’ Challenge Solution” and sent to Xplain Corporation (the publishers of MacTech Magazine) via “snail mail” or preferably, e-mail - AppleLink: MT.PROGCHAL, Internet: progchallenge@xplain.com, and CompuServe: 71552,174. If you send via snail mail, please include a disk with the solution and all related files (including contact information). See page 2 for information on “How to Contact Xplain Corporation.”
MacTech Magazine reserves the right to publish any solution entered in the Programming Challenge of the Month and all entries are the property of MacTech Magazine upon submission. The submission falls under all the same conventions of an article submission.
- SPREAD THE WORD:
- Slashdot
- Digg
- Del.icio.us
- Newsvine