home *** CD-ROM | disk | FTP | other *** search
- TUG PDS CERT 1.01 (Documentation)
-
- ==========================================================================
-
- TUG PUBLIC DOMAIN SOFTWARE CERTIFICATION
-
- The Turbo User Group (TUG) is recognized by Borland International as the
- official support organization for Turbo languages. This file has been
- verified by the TUG library staff. We are reasonably certain that the
- information contained in this file is public domain material, but
- it is also subject to any restrictions applied by its author.
-
- This diskette contains information determined to be in the PUBLIC
- DOMAIN, provided as a service of TUG for the use of its members. The
- Turbo User Group will not be liable for any damages, including any lost
- profits, lost savings or other incidental or consequential damages arising
- out of the use of or inability to use the contents, even if TUG has been
- advised of the possibility of such damages, or for any claim by any
- other party.
-
- To the best of our knowledge, the information in this file is accurate.
-
- If you discover an error in this file, we would appreciate it if you would
- report it to us. To report bugs, or to request information on membership
- in TUG, please contact us at:
-
- Turbo User Group
- PO Box 1510
- Poulsbo, Washington USA 98370
-
- --------------------------------------------------------------------------
- F i l e I n f o r m a t i o n
-
- * DESCRIPTION
- Documentation file explaining about Boyer-MOORE.
-
- * ASSOCIATED FILES
- SEARCHES.PAS
- BOYER.DOC
- SEARCHES.DOC
-
- * CHECKED BY
- DRM - 08/14/88
-
- * KEYWORDS
- TURBO PASCAL V4.0 DOCUMENTATION SEARCH INLINE
-
- ==========================================================================
- }
- W. Vann Hall
- Pala Designs
- P.O. Box 10804
- Arlington, VA 22210
- CIS 70346,1144
- MCI WVHALL
-
- ABOUT Boyer-MOORE:
-
- Boyer-Moore is an attempt to speed up the usually serial text search.
- Traditionally, text searches proceed from the first character onward.
- (In these examples, "string" refers to string we're trying to locate,
- "text" the array of characters we're trying to find the string in.
- These all ignore case sensitivity, matching around line boundaries,
- punctuation, etc.)
-
- String to find: STRING
- Text to find it in: HOW YOU SEARCH FOR A STRING
-
- Our StringPointer and TextPointer both start at 1. We compare
- String[1] and Text[1]. Since "S" and "H" don't match, we increment the
- text pointer and compare again. At TextPointer = 9, the "S" of
- "STRING" and the "S" of "SEARCH" match, so we increment TextPointer
- *and* StringPointer and compare String[2] and Text[10]. "T" and "E"
- don't match, so we reset StringPointer and increment TextPointer again.
- And so on. You can see that it takes 22 comparisons before we locate
- the correct "S" and a full 27 before we know that we have located
- "STRING."
-
- Boyer-Moore, on the other hand, works from the end of the string (but
- still the beginning of the text). First, it creates a look-up table of
- values corresponding to the distance of the first occurence of each
- character from the end of the string. For instance, the table for
- "STRING" would read:
-
- A=6 B=6 C=6 D=6 E=6 F=6 G=6 H=6 I=2 J=6 K=6 L=6 M=6
- N=1 O=6 P=6 Q=6 R=3 S=5 T=4 U=6 V=6 W=6 X=6 Y=6 Z=6
-
- Note that characters not located in the string are set to
- Length(String), as it the final character.
-
- Since a 6-character string can't be located entirely within the first 5
- characters, we start by comparing the String[6] -- "G" -- with Text[6]:
- "O". They don't match, so we increment TextPointer by the value
- associated with "O" in the table; that is, by 6. Our next compare is
- "G" with the "R" in "SEARCH". We now increment TextPointer by "R"'s
- value, 3, and compare "S" to " ". And so on. Here's a tracing of the
- progression; the letter to be compared is highlighted by lower casing:
-
- STRINg
- HOW YoU SEARCH FOR A STRING
-
- STRINg
- HOW YOU SEArCH FOR A STRING
-
- STRINg
- HOW YOU SEARCH^FOR A STRING
-
- STRINg
- HOW YOU SEARCH FOR A STRING
-
- STRINg
- HOW YOU SEARCH FOR A STRINg
-
- STRIng
- HOW YOU SEARCH FOR A STRIng
-
- STRing
- HOW YOU SEARCH FOR A STRing
-
- STring
- HOW YOU SEARCH FOR A STring
-
- String
- HOW YOU SEARCH FOR A String
-
- string
- HOW YOU SEARCH FOR A string
-
- There; 10 comparisons total. You can see how this would speed
- searching a long text -- enough to make up for the additional overhead
- of compiling the look-up table.