home *** CD-ROM | disk | FTP | other *** search
- 1.) What do we have here?
-
- This is the README file for 'strings'. 'strings' is a rewrite
- and replacement for the 4BSD program of the same name.
- 'strings' looks for sequences of printable characters in a file and
- outputs them.
-
- Current version is 1.6.9.
-
- (To get best results when reading these files, use an option in your
- favourite editor to expand a TAB to 4 SPACEs. E.g. in vi it is
- "set tabstop=4")
-
- You should have the files:
- README - this file
- COPYING - Copyright Notice
- Makefile - Makefile
- strings.h
- config.h
- tune.h
- strings.c - the main source
- limits.c - the UNIX (trademark of AT&T) specific stuff to identify
- an initialized data segment
- output.c - output routines
- test_input - a file containing the 2 characters the original strings
- stumbled over. Unpack it with atob.
- strings.1 - manual page.
- strings.txt - manual page without nroff sequences
-
- 2.) How to build strings.
-
- Now that you have strings you will want to build it.
- The program is shipped with UFLAGS undefined (see below for an explanation).
- On UNIX (trademark of AT&T) systems, you should be able to build the
- program by just typing "make". On non-UNIX systems you might have problems.
-
- a.) edit Makefile. If you want to play it safe, set UFLAGS=-DSAFETY_FIRST.
- strings should now compile without any problems.
- You won't get the UNIX specific stuff: the program does not
- try to identify the initialized data segment.
-
- b.) If you don't want to play it safe, but rather you want to configure
- strings to your system, take a look at config first. There is a
- list of systems. If one of these is yours, edit Makefile and
- set UFLAGS to nothing. When compiling the defines for the system
- are used.
- WARNING: some things may differ between different versions of the
- same system. On some machines there is no easy way to distinguish
- between such versions.
- If you were wrong, and the system you are using is not in the
- list of known symbols, the minimal defaults, like in a.) will be
- used.
-
- c.) You want to configure strings, and your system is not in the list
- of known systems.
- Edit Makefile and set UFLAGS=-DUSE_USER_DEFINES.
- Edit tune.h and set things up for your system. The variables are
- commented.
-
- There are 3 header files. The inclusion works like this.
- (you can skip this)
-
- (reading strings.h)
- |
- v
- (including config.h)
- |
- v
- <----- is SAFETY_FIRST defined?
- | |
- | | -
- | v +
- | is USE_USER_DEFINES defined? ---->use stuff from tune.h ---> continue
- | |
- | | -
- | v
- | is this machine 1 ?
- | |
- | | -
- | v
- | is this machine 2 ?
- | |
- | | -
- | v
- | ...
- | |
- | | -
- v v
- use safe defaults
- |
- v
- continue
-
- The program, or rather the headerfiles know about the following machines:
-
- - VAX 11/780 (4.3 BSD) by "unix" and "vax" and not "ultrix"
- - SIEMENS PC-MX2 (SINIX v2.1) by "nsc3200" and "sinix" and "ns16000"
- - Sun 3/260 (SunOS 3.5) by "unix" and "sun" and "mc68020"
- - VAX 6800 (Ultrix 2.1) by "unix" and "ultrix" and "bsd4_2"
- - uVAX (VMS 5.1) by "vms" and "vax"
-
- 3.) Why is this strings better than the standard one?
-
- a.) This version of strings is at least 4 times faster than the original
- one. If the minimal string length is set to something else, it might
- even be 10 times faster.
- b.) The original one had several bugs.
- c.) This one is public domain. You get source.
-
- ad a.)
- Here are results of some tests:
-
- machine: PC-MX2, OS: SINIX v2.1 file: /vmsinix (289084)
- old : u 43.6 s 1.1 = 44.7
- new : u 3.8 s 2.3 = 6.1
-
- machine: VAX 11/780 OS: 4.3BSD file: /vmunix (329728)
- old : u 18.0 s 2.7 = 20.7
- new : u 1.5 s 0.9 = 2.4
-
- machine: SUN 3/260 OS: SunOS 3.5 file: /vmunix (558359)
- old : u 6.5 s 0.6 = 7.1
- new : u 1.6 s 0.2 = 1.8
-
- machine: VAX 6800 OS: Ultrix 2.1 file: /vmunix (662528)
- old : u 5.2 s 0.4 = 5.6
- new : u 0.6 s 0.0 = 0.6
-
- User, sys and total times in seconds.
-
- ad b.)
- The original strings
- - thinks control-L (0x0c) is a printable character
- - under some circumstances thinks 0x80 is printable. In the
- package there is a file, test_input. Unpack this file with
- atob. The file now contains several lines of characters including
- a line with control-L and one with a 0x80. The original strings
- errs for both cases.
- - did not get the start address of the initialized data right on
- some systems.
- - had problems when dealing with the standard input.
- The first two bugs have been found on 43BSD, SunOS and ULTRIX, the
- third only on MX2 SINIX v2.1.
-
- 4.) What about bugs?
-
- If you find bugs, tell me. If you fixed them or if you made an
- extension which really is one, drop me a note.
-
- 5.) Notes
-
- This program is about 7 times faster than the orignal one.
- There are two reasons for this:
- - It does not use fgetc/fputc to get or put characters, but
- reads characters in blocks. It does not copy them but rather moves
- pointers around on the input buffer. There is no
- procedure call needed to get at each character.
- When a sequence is found, it is put into the output buffer in one
- block, thus there is no need, like in fputc, to check for possible
- overflow for each character.
- - When the program searches for a sequence of printable characters
- it only examines each min_str_len character instead of each one.
- min_str_len defaults to 4 and can be set with command line option
- like "strings -3".
-
- It can be sped up some more, but then it would be difficult to port
- it to different systems.
- Example:
- Currently the program takes 6.0 seconds on MX2 for /vmsinix.
- The improved version only needs 5.5 seconds. It is also much smaller:
- 6976 bytes compared to 10596.
- Ways to improve the program:
- - On some machines another method to test whether a character is printable
- will be faster. Now the program uses an array (isp), uses a character
- cast to a (signed) integer as index into this array (isp_mid is the
- base from which offsets are computed).
- On MX2, and, if I believe my tests, on VAX, it is faster, to use
- unsigned characters as index into this array.
- If you want to play around with this, just change CHAR_TYPE to
- 'unsigned byte' and define the macro IS_PRINTABLE accordingly to
- '(isp[c])'
- - It makes a difference (although a small one), what basic type you
- choose for the isp array. On MX2 short is best, but char is nearly
- as good.
- - You can make it smaller. The program does not need stdio. But
- exit normally closes file descriptors, and therefore includes a
- large part of the stdio stuff. Well, about 4 K on some systems.
- If you know what your exit does, you can substitute a suitable
- routine of your own. E.g. on MX2, exit calls _cleanup, which
- only closes all open file descriptors. As I know that only the
- standard descriptors are open at the end of the program, I can
- write a _cleanup which only does a close on 0, 1, and 2.
-
- The savings that you get are almost invisible, they are not easily
- portable, but rather require a certain amount of research on part
- of the person doing the porting. I chose not to fit the programs with
- options to adjust these things.
-
- There are still some DEBUG statements in the code. You get them
- if you set DEBUGFLAGS=-DDEBUG.
-
- 6.) Status
-
- This program is placed into the public domain.
- The Copyright Notice in COPYING applies.
-
- Absorb, apply and enjoy,
- Michael Greim
-