Power-Programmierung

home *** CD-ROM | disk | FTP | other *** search

/ Power-Programmierung / CD1.mdf / pascal / library / dos / packer / arc / arctool / boyer.doc < prev next >

Wrap

Text File | 1988-09-29 | 3.2 KB | 94 lines

* DESCRIPTION Documentation file explaining about Boyer-MOORE. * ASSOCIATED FILES SEARCHES.PAS BOYER.DOC SEARCHES.DOC } W. Vann Hall Pala Designs P.O. Box 10804 Arlington, VA 22210 CIS 70346,1144 MCI WVHALL ABOUT Boyer-MOORE: Boyer-Moore is an attempt to speed up the usually serial text search. Traditionally, text searches proceed from the first character onward. (In these examples, "string" refers to string we're trying to locate, "text" the array of characters we're trying to find the string in. These all ignore case sensitivity, matching around line boundaries, punctuation, etc.) String to find: STRING Text to find it in: HOW YOU SEARCH FOR A STRING Our StringPointer and TextPointer both start at 1. We compare String[1] and Text[1]. Since "S" and "H" don't match, we increment the text pointer and compare again. At TextPointer = 9, the "S" of "STRING" and the "S" of "SEARCH" match, so we increment TextPointer *and* StringPointer and compare String[2] and Text[10]. "T" and "E" don't match, so we reset StringPointer and increment TextPointer again. And so on. You can see that it takes 22 comparisons before we locate the correct "S" and a full 27 before we know that we have located "STRING." Boyer-Moore, on the other hand, works from the end of the string (but still the beginning of the text). First, it creates a look-up table of values corresponding to the distance of the first occurence of each character from the end of the string. For instance, the table for "STRING" would read: A=6 B=6 C=6 D=6 E=6 F=6 G=6 H=6 I=2 J=6 K=6 L=6 M=6 N=1 O=6 P=6 Q=6 R=3 S=5 T=4 U=6 V=6 W=6 X=6 Y=6 Z=6 Note that characters not located in the string are set to Length(String), as it the final character. Since a 6-character string can't be located entirely within the first 5 characters, we start by comparing the String[6] -- "G" -- with Text[6]: "O". They don't match, so we increment TextPointer by the value associated with "O" in the table; that is, by 6. Our next compare is "G" with the "R" in "SEARCH". We now increment TextPointer by "R"'s value, 3, and compare "S" to " ". And so on. Here's a tracing of the progression; the letter to be compared is highlighted by lower casing: STRINg HOW YoU SEARCH FOR A STRING STRINg HOW YOU SEArCH FOR A STRING STRINg HOW YOU SEARCH^FOR A STRING STRINg HOW YOU SEARCH FOR A STRING STRINg HOW YOU SEARCH FOR A STRINg STRIng HOW YOU SEARCH FOR A STRIng STRing HOW YOU SEARCH FOR A STRing STring HOW YOU SEARCH FOR A STring String HOW YOU SEARCH FOR A String string HOW YOU SEARCH FOR A string There; 10 comparisons total. You can see how this would speed searching a long text -- enough to make up for the additional overhead of compiling the look-up table.