HexFind - a fast pattern finder for OS/2 version 2.1 (10/11/2000) Copyright 1998 - 2000 by Heinz Repp I wrote HexFind mainly because I wanted a program that would search for any sequence of bytes in any type of file, in contrast to the grep family or the system's find command which is restricted to string patterns and files of printable lines. Therefore the search argument is given to HexFind as a sequence of bytes in hexadecimal notation, a pair for each byte, and the positions in the files matching the pattern are output as byte offsets from the beginning (hexadecimal and decimal). HexFind 2.0 has been completely rewritten with an improved search algorithm and the ability to search recursively through subdirectories as well as hidden and system files for patterns including literal strings and wildcards. Version 2.1 added the ability to search for strings case- insensitive and a Win32 version. HexFind is invoked with: HexFind [] [ ...] the search pattern may consist of: - bytes in hexadecimal notation; every byte must be given as pair of hexadecimal digits, e. g. a9, ff, 03 or 00 are valid bytes - literal strings enclosed in single quotes, e. g. 'string', '34' - one or more , each representing one byte of arbitrary value at any but the last position in the search pattern The search pattern is ONE argument to HexFind, so it must be either contiguous (not containing any blanks) or enclosed in double quotes. A valid pattern would be <1b'K'?'xyz'0d0a> or the equivalent <"1B 'K' ? 'xyz' 0D 0A"> (without the <>). options follow the syntax [-|/][v|r|s|h|i][+|-], in detail: '-' or '/' as switching character one of: 'v' = verbose; if activated HexFind reports every file it searches through and a summary of files and bytes searched at the end 'r' = recursive; if activated HexFind searches also all files matching the file specifier in all subdirectories of the starting directory 's' = system; if activated also files with the system attribute are searched 'h' = hidden; if activated also files with the hidden attribute are searched 'i' = ignore case; if activated searches for the string part of the search pattern case-insensitive '+' or '-' to activate or deactivate the specified option; '+' may be omitted; at start all options are deactivated. Examples of valid options are '/v+', '-h-' or '/R /V'. Case does not matter, and successive options may be concatenated; so '/V+ /R+ /S-' is equivalent to '/vrs-'. The options may appear at any position on the command line. HexFind evaluates the command line in order, so all options are valid only for the following file specifiers. The command line HexFind /v 'pattern' /v- /r /r- /v prints the search pattern as hexadecimal coded bytes, then searches for the string 'pattern' - without listing the files searched - in all files matching in the current directory and all its subdirectories, then in all files matching in the current directory only, and prints at last the summary of all files searched. Some notes on the case-insensitive search: the -switch affects only letters (A-Z and a-z) in the string parts of the pattern (if any), NOT the bytes coded hexadecimal; so it is possible (but seldom meaningful) to search only for parts of the pattern case-sensitive. The -switch should precede the pattern, else there is no effect at all (besides increasing the search time a little bit). A combination like HexFind /i+ 'PaTtErN' /i- would also be possible; this would perform an exact search on the pattern with all letters lowercase ('pattern'). The file specifiers consist of an optional absolute or relative path and a valid filename including wildcards. If no file specifier is given, HexFind searches the standard input, allowing its use with pipes. The search algorithm used uses elements from D. M. Sunday's 'Quicksearch' and the extensions 'Optimal Mismatch' (described by Guido Gronek in the German magazine c't 3/95, pp. 278) and 'Turbosearch' by Michael Tamm (c't 8/97, pp. 292). I developed it on my own by extending the way that Turbosearch outperforms Quicksearch and have called it 'Fastsearch'. The elements of Optimal Mismatch make it robust when used with highly repetitive patterns and/or files without sacrificing performance significantly. HexFind uses a static probability table for all possible byte values found empirically by examining the contents of my complete hard disk, about 2 Gigabyte of very different kinds of data. In typical search scenarios my 'raw' algorithm (without the Optimal Mismatch extension) is up to 10% faster than Turbosearch, which itself outperforms Quicksearch by another 10%. Its trick is that it reduces the number of comparisons to zero most of the time! The inner circle is extremely short, consisting only of dereferencing a pointer, a conditional branch and one (pointer + integer) addition. The practical relevance of the very fast search algorithm is limited because it is not the most time critical part of the program. The searching routine consumes less than 10% of the computing time. Most time is needed to open and read the files into memory. The first version of HexFind tried to address this by using multiple threads, one to search for matching files and opening them, one to read them into memory buffers and one to search those buffers. Unfortunately the gain of multithreading was very small, mostly outweighed by the overhead necessary for the inter-thread communication. Numerous experiments showed that the only way threads could speed things up significantly would be the use of a separate output thread, especially if there are many hits. This output thread would use a private queue, but the same effect can be achieved by redirecting standard output to a pipe. There are many separate programs already available for this purpose providing buffering between program output and standard output (e. g. the buffer program included in Andreas Kaiser's GTAK/GTAR package). For this reason I left off multithreading completely in this version of HexFind. HexFind is freeware. You may use it freely, give it away to friends, post it for download and include it in shareware/ freeware collections on any media provided that there is not more than a nominal charge for the copying process and the media, and that the package is left unchanged, including source, executable and this file. Use it completely at your own risk! The obligatory word at last: THIS SOFTWARE IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE, OR OF ANY OTHER TYPE, WHETHER EXPRESS OR IMPLIED, AND TO ANY REMEDY AGAINST ME, WHETHER IN CONTRACT, TORT, DELICT, QUASI-DELICT OR OTHERWISE. SOME JURISDICTION DO NOT ALLOW THE EXCLUSION OF CERTAIN IMPLIED WARRANTIES SO THE PRECEDING EXCLUSIONS MAY NOT APPLY. TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT WILL I BE LIABLE FOR ANY SPECIAL, CONSEQUENTIAL, INCIDENTAL OR DIRECT OR INDIRECT DAMAGES (INCLUDING WITHOUT LIMITATION LOSS OF PROFIT) ARISING OUT OF USER'S USE OR INABILITY TO USE THE SOFTWARE OR ANY INFORMATION ACCOMPANYING IT, WHETHER OR NOT I HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH LOSS, HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY. THIS EXCLUSION INCLUDES ANY LIABILITY THAT MAY ARISE OUT OF THIRD-PARTY CLAIMS AGAINST THE USER. THESE LIMITATIONS SHALL APPLY NOTWITHSTANDING ANY FAILURE OF ESSENTIAL PURPOSE OF ANY LIMITED REMEDY.