home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Frostbyte's 1980s DOS Shareware Collection
/
floppyshareware.zip
/
floppyshareware
/
VORX
/
BOYER.ARC
/
BOYER.PAS
Wrap
Pascal/Delphi Source File
|
1986-04-10
|
19KB
|
507 lines
PROGRAM TimeTest;
{ ============================================================================ )
W. Vann Hall
Pala Designs
P.O. Box 10804
Arlington, VA 22210
CIS 70346,1144
MCI WVHALL
{ ============================================================================ )
TimeTest compares the operating speed of two non-Turbo Pos functions
that allow a string to be mapped into an array of <= MaxInt chars. One
is a typical search algorithm but is written in InLine machine code;
the other is an implementation of the Boyer-Moore search algorithm as
described in the September 1984 Scientific American, but coded in Turbo
Pascal. The results are a tad surprising.
Boyer-Moore, as you might guess from the description below, is very
dependent on the nature of the search string -- length, false starts,
etc. As a result, comparisons with the InLine BlockPos function vary
according to target string. BlockPos seems about 10 times faster for
searches for a short target string with a number of false hits. It
runs about 6 times faster on an unsuccessful search. However, it only
runs about twice as fast in searching for a long string with few false
hits. (Remember, too, that we are comparing apples and oranges here:
InLine code against a pure-Turbo implementation.)
Code has been stolen from several places for this demo. The timer
functions are lifted from Neil J. Rubenking's CONTAINS.PAS (available
on Borland SIG DL 1 as "CONTAN.PAS"), while the InLine "BlockPos"
function is a creation of Randy Forgaard and was sent to me in a SIG
message. The careful benchmarker will want to test 500 or more
repetitions of the loops instead of 50; the current program takes about
3 minutes to compile and run.
======================================================================
ABOUT Boyer-MOORE:
Boyer-Moore is an attempt to speed up the usually serial text search.
Traditionally, text searches proceed from the first character onward.
(In these examples, "string" refers to string we're trying to locate,
"text" the array of characters we're trying to find the string in.
These all ignore case sensitivity, matching around line boundaries,
punctuation, etc.)
String to find: STRING
Text to find it in: HOW YOU SEARCH FOR A STRING
Our StringPointer and TextPointer both start at 1. We compare
String[1] and Text[1]. Since "S" and "H" don't match, we increment the
text pointer and compare again. At TextPointer = 9, the "S" of
"STRING" and the "S" of "SEARCH" match, so we increment TextPointer
*and* StringPointer and compare String[2] and Text[10]. "T" and "E"
don't match, so we reset StringPointer and increment TextPointer again.
And so on. You can see that it takes 22 comparisons before we locate
the correct "S" and a full 27 before we know that we have located
"STRING."
Boyer-Moore, on the other hand, works from the end of the string (but
still the beginning of the text). First, it creates a look-up table of
values corresponding to the distance of the first occurence of each
character from the end of the string. For instance, the table for
"STRING" would read:
A=6 B=6 C=6 D=6 E=6 F=6 G=6 H=6 I=2 J=6 K=6 L=6 M=6
N=1 O=6 P=6 Q=6 R=3 S=5 T=4 U=6 V=6 W=6 X=6 Y=6 Z=6
Note that characters not located in the string are set to
Length(String), as it the final character.
Since a 6-character string can't be located entirely within the first 5
characters, we start by comparing the String[6] -- "G" -- with Text[6]:
"O". They don't match, so we increment TextPointer by the value
associated with "O" in the table; that is, by 6. Our next compare is
"G" with the "R" in "SEARCH". We now increment TextPointer by "R"'s
value, 3, and compare "S" to " ". And so on. Here's a tracing of the
progression; the letter to be compared is highlighted by lower casing:
STRINg
HOW YoU SEARCH FOR A STRING
STRINg
HOW YOU SEArCH FOR A STRING
STRINg
HOW YOU SEARCH^FOR A STRING
STRINg
HOW YOU SEARCH FOR A STRING
STRINg
HOW YOU SEARCH FOR A STRINg
STRIng
HOW YOU SEARCH FOR A STRIng
STRing
HOW YOU SEARCH FOR A STRing
STring
HOW YOU SEARCH FOR A STring
String
HOW YOU SEARCH FOR A String
string
HOW YOU SEARCH FOR A string
There; 10 comparisons total. You can see how this would speed
searching a long text -- enough to make up for the additional overhead
of compiling the look-up table.
( ============================================================================ }
TYPE
String255 = STRING[255];
OnOrOff = (On, Off); { Used by timer function }
ArrayType = ARRAY[1..MaxInt] OF CHAR;
VAR
ByteArray : ARRAY[1..MaxInt] OF BYTE;
TextArray : ArrayType ABSOLUTE ByteArray;
DummyInt : INTEGER;
OK : BOOLEAN;
MakeItEasy : BYTE;
StringLength : BYTE;
FindString : String255;
Count : BYTE;
ArrayLength : INTEGER;
Start : REAL; { Used by timer function }
Time : REAL; { Used by timer function }
CharArray : ARRAY[#0..#127] OF BYTE;
CharCount : #0..#127;
{ ============================================================================ )
TIMER MODULE BEGINS HERE
Timer functions from Neil J. Rubenking's CONTAINS.PAS.
NOTE: Because checking the time TAKES time, timings for very short
events will not be correct. You will want to time 100, 1000, or even
10000 repetitions of short events.
( ============================================================================ }
PROCEDURE Timer(O : OnOrOff);
TYPE
RegPack = RECORD
ax, bx, cx, dx, bp, si, di, ds, es, flags : Integer;
END;
VAR
RecPack : RegPack;
Hour : INTEGER;
Min : INTEGER;
Sec : INTEGER;
Hun : INTEGER;
BEGIN
WITH RecPack DO
BEGIN
ax := $2C SHL 8;
END;
Intr($21, recpack); {call interrupt}
WITH RecPack DO
BEGIN
Hour := cx SHR 8;
Min := cx AND $FF;
Sec := dx SHR 8;
Hun := dx AND $FF;
END;
IF O = On THEN
BEGIN
Start := Hour*3600+Min*60+Sec+Hun/100;
Time := 0;
END
ELSE
BEGIN
Time := Hour*3600+Min*60+Sec+Hun/100-Start;
Start := 0;
END;
END; { Timer }
{ ============================================================================ )
BlockPos
InLine replacement for the turbo "Pos" function.
By Randy Forgaard, 70307,521. Uploaded as a Borland SIG message.
( ============================================================================ }
FUNCTION BlockPos ( VAR Buffer;
Size: INTEGER;
S : String255): INTEGER;
BEGIN
{Load "buffer" address into ES:DI, "buffer" offset into BX, Length(s) -
1 into DX, contents of "s[1]" into AL, offset of "s[2]" into SI, and
"size" - Length(s) + 1 into CX. If "size" < Length(s), or if
Length(s) = 0, return zero.}
Inline($1E/ { PUSH DS }
$16/ { PUSH SS }
$1F/ { POP DS }
$C4/$BE/>buffer/ { LES DI,buffer[BP]}
$89/$FB/ { MOV BX,DI }
$8B/$8E/>size/ { MOV CX,size[bp] }
$8D/$76/<s+2/ { LEA SI,s+2[bp] }
$8A/$46/<s+1/ { MOV AL,s+1[bp] }
$8A/$56/<s/ { MOV DL,s[bp] }
$84/$D2/ { TEST DL,DL }
$74/$23/ { JZ ERROR }
$FE/$CA/ { DEC DL }
$30/$F6/ { XOR DH,DH }
$29/$D1/ { SUB CX,DX }
$76/$1B/ { JBE ERROR }
{Scan the ES:DI buffer, looking for the first occurrence of "s[1]." If
not found prior to reaching Length(s) characters before the end of the
buffer, return zero. If Length(s) = 1, the entire string has been
found, so report success.}
$FC/ { CLD }
$F2/ {NEXT: REPNE }
$AE/ { SCASB }
$75/$16/ { JNE ERROR }
$85/$D2/ { TEST DX,DX }
$74/$0C/ { JZ FOUND }
{Compare "s" (which is at SS:SI) with the ES:DI buffer, in both cases
starting with the first byte just past the length byte of the string.
If "s" does not match what is at the DI position of the buffer, reset
the registers to the values they had just prior to the comparison, and
look again for the next occurrence of the length byte.}
$51/ { PUSH CX }
$57/ { PUSH DI }
$56/ { PUSH SI }
$89/$D1/ { MOV CX,DX }
$F3/ { REPE }
$A6/ { CMPSB }
$5E/ { POP SI }
$5F/ { POP DI }
$59/ { POP CX }
$75/$EC/ { JNE NEXT }
{String found in buffer. Set AX to the offset, within buffer, of the
first byte of the string (the length byte), assuming that the first
byte of the buffer is at offset 1.}
$89/$F8/ {FOUND: MOV AX,DI }
$29/$D8/ { SUB AX,BX }
$EB/$02/ { JMP SHORT RETURN }
{An "error" condition. Return zero.}
$31/$C0/ {ERROR: XOR AX,AX }
$1F/ {RETURN: POP DS }
$8B/$E5/ { MOV SP,BP }
$5D/ { POP BP }
$C2/$08/$01) { RET 108H }
END {BlockPos};
{ ============================================================================ )
Boyer-Moore
My implementation of the Boyer-Moore search algorithm.
Simple modifications to make this routine more useful include UpCase-
ing both the string and the text character to allow case-insensitive
matching; allowing a space (#32) in the string to match any number of
characters of value #10, #13, or #32, to allow matching around line
breaks; and the building of an inverted table and backwards search
mechanism, to allow searches for preceding hits (useful interactively;
for instance, in a text editor).
( ============================================================================ }
PROCEDURE MakeTable (MatchString : String255);
VAR
StringLength : BYTE ABSOLUTE MatchString;
VAR
Counter : BYTE;
BEGIN
FillChar(CharArray[#0],128,StringLength);
FOR Counter := StringLength-1 DOWNTO 1 DO
IF CharArray[MatchString[Counter]] = StringLength THEN
CharArray[MatchString[Counter]] := StringLength-Counter;
END;
{ ---------------------------------------------------------------------------- }
FUNCTION Boyer_Moore( VAR Buffer : ArrayType;
StartPointer: INTEGER;
ArraySize : INTEGER;
MatchString : String255) : INTEGER;
VAR
ArrayPointer : INTEGER;
StringLength : BYTE ABSOLUTE MatchString;
StringPointer : BYTE;
BEGIN
StringPointer := StringLength;
ArrayPointer := (StartPointer+StringLength)-1;
REPEAT
IF Buffer[ArrayPointer] = MatchString[StringPointer] THEN
BEGIN
ArrayPointer := Pred(ArrayPointer);
StringPointer := Pred(StringPointer);
END
ELSE
BEGIN
StringPointer := StringLength;
ArrayPointer := ArrayPointer+CharArray[Buffer[ArrayPointer]];
END;
UNTIL (StringPointer = 0) OR (ArrayPointer >= ArrayLength);
IF NOT (StringPointer = 0) THEN
Boyer_Moore := 0
ELSE
Boyer_Moore := ArrayPointer+1;
END;
{ ============================================================================ )
Simple readfile operation.
Uses source for this program as text to be searched.
( ============================================================================ }
PROCEDURE ReadFile (NameStr : String255);
VAR
InFile : FILE;
CharFile : FILE OF CHAR;
Records : INTEGER;
BEGIN
Assign(InFile,NameStr);
Reset(InFile);
Records := FileSize(InFile);
BlockRead(InFile,ByteArray,Records);
Close(InFile);
Assign(CharFile,NameStr);
Reset(CharFile);
ArrayLength := FileSize(CharFile);
Close(CharFile);
End;
{ ============================================================================ )
MAIN
This program tests 50 repetitions of the BlockPos and Boyer-Moore
functions, with several variations, using the program's source code for
the text to be searched. As given here, both search functions are
case-sensitive; thus, the target string is given in lower-case in the
source code and then UpCased. This allows us to be certain, for
example, that the "end." matched in some of the tests is the upper-case
"end." at the close of the source and not the "end." in thios sentence.
Variations include searches for a short string ("end.") with many false
hits (on "end;"); searches for a longer, less ambiguous string; unsuc-
cessful searches; and searches for repeated occurences of a word (in
Boyer-Moore, only). In addition, the MakeTable procedure is included
in one of the Boyer-Moore loops in the first test to show that construct-
ing the table takes little time.
( ============================================================================ }
BEGIN
ClrScr;
ReadFile('NEWCON.PAS');
{ ------ Test 1 --------------------- }
FindString := 'end.';
FOR Count := 1 TO Ord(FindString[0]) DO
FindString[Count] := UpCase(FindString[Count]);
WriteLn('Test of BlockPos function ');
Write('50 successful tries (with many false hits) take ');
timer(on);
FOR Count := 1 TO 50 DO
DummyInt := BlockPos(TextArray,ArrayLength,FindString);
timer(off);
WriteLn( time:1:2, ' seconds');
Writeln;
WriteLn('Test of Boyer-Moore function (MakeTable in loop)');
Write('50 successful tries (with many false hits) take ');
timer(on);
FOR Count := 1 TO 50 DO
BEGIN
MakeTable(FindString);
DummyInt := Boyer_Moore(TextArray,1,ArrayLength,FindString);
END;
timer(off);
WriteLn( time:1:2, ' seconds');
Writeln;
WriteLn('Test of Boyer-Moore function (MakeTable not in loop)');
Write('50 successful tries (with many false hits) take ');
timer(on);
MakeTable(FindString);
FOR Count := 1 TO 50 DO
DummyInt := Boyer_Moore(TextArray,1,ArrayLength,FindString);
timer(off);
WriteLn( time:1:2, ' seconds');
Writeln;
{ ------ Test 2 --------------------- }
FindString := 'few screwy hits with this zany target';
FOR Count := 1 TO Ord(FindString[0]) DO
FindString[Count] := UpCase(FindString[Count]);
WriteLn('Test of BlockPos function ');
Write('50 successful tries (with few false hits) take ');
timer(on);
FOR Count := 1 TO 50 DO
DummyInt := BlockPos(TextArray,ArrayLength,FindString);
timer(off);
WriteLn( time:1:2, ' seconds');
Writeln;
WriteLn('Test of Boyer-Moore function (MakeTable not in loop)');
Write('50 successful tries (with few false hits) take ');
timer(on);
MakeTable(FindString);
FOR Count := 1 TO 50 DO
DummyInt := Boyer_Moore(TextArray,1,ArrayLength,FindString);
timer(off);
WriteLn( time:1:2, ' seconds');
Writeln;
{ ------ Test 3 --------------------- }
FindString := 'not here.';
FOR Count := 1 TO Ord(FindString[0]) DO
FindString[Count] := UpCase(FindString[Count]);
WriteLn('Test of BlockPos function ');
Write('50 unsuccessful tries take ');
timer(on);
FOR Count := 1 TO 50 DO
DummyInt := BlockPos(TextArray,ArrayLength,FindString);
timer(off);
WriteLn( time:1:2, ' seconds');
Writeln;
WriteLn('Test of Boyer-Moore function (MakeTable not in loop)');
Write('50 unsuccessful tries take ');
timer(on);
MakeTable(FindString);
FOR Count := 1 TO 50 DO
DummyInt := Boyer_Moore(TextArray,1,ArrayLength,FindString);
timer(off);
WriteLn( time:1:2, ' seconds');
Writeln;
{ ------ Test 4 --------------------- }
FindString := 'fake';
FOR Count := 1 TO Ord(FindString[0]) DO
FindString[Count] := UpCase(FindString[Count]);
WriteLn('Test of repeating Boyer-Moore function ');
Write('50 successful tries (with few false hits) take ');
timer(on);
DummyInt := 1;
MakeTable(FindString);
FOR Count := 1 TO 50 DO
BEGIN
DummyInt := Boyer_Moore(TextArray,DummyInt,ArrayLength,FindString);
END;
timer(off);
WriteLn( time:1:2, ' seconds');
{ ============================== TARGET STRINGS ============================== }
{
FEW SCREWY HITS WITH THIS ZANY TARGET
FAKE FAKE FAKE FAKE FAKE FAKE FAKE FAKE FAKE FAKE
FAKE FAKE FAKE FAKE FAKE FAKE FAKE FAKE FAKE FAKE
FAKE FAKE FAKE FAKE FAKE FAKE FAKE FAKE FAKE FAKE
FAKE FAKE FAKE FAKE FAKE FAKE FAKE FAKE FAKE FAKE
FAKE FAKE FAKE FAKE FAKE FAKE FAKE FAKE FAKE FAKE
}
END.
Press <CR> to continue: