home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
-
- sdbm - Substitute DBM
- or
- Berkeley ndbm for Every UN*X[1] Made Simple
-
- Ozan (oz) Yigit
-
- The Guild of PD Software Toolmakers
- Toronto - Canada
-
- oz@nexus.yorku.ca
-
-
-
- Implementation is the sincerest form of flattery. - L. Peter
- Deutsch
-
- A The Clone of the ndbm library
-
- The sources accompanying this notice - sdbm - consti-
- tute the first public release (Dec. 1990) of a complete
- clone of the Berkeley UN*X ndbm library. The sdbm library is
- meant to clone the proven functionality of ndbm as closely
- as possible, including a few improvements. It is practical,
- easy to understand, and compatible. The sdbm library is not
- derived from any licensed, proprietary or copyrighted
- software.
-
- The sdbm implementation is based on a 1978 algorithm
- [Lar78] by P.-A. (Paul) Larson known as ``Dynamic Hashing''.
- In the course of searching for a substitute for ndbm, I pro-
- totyped three different external-hashing algorithms [Lar78,
- Fag79, Lit80] and ultimately chose Larson's algorithm as a
- basis of the sdbm implementation. The Bell Labs dbm (and
- therefore ndbm) is based on an algorithm invented by Ken
- Thompson, [Tho90, Tor87] and predates Larson's work.
-
- The sdbm programming interface is totally compatible
- with ndbm and includes a slight improvement in database ini-
- tialization. It is also expected to be binary-compatible
- under most UN*X versions that support the ndbm library.
-
- The sdbm implementation shares the shortcomings of the
- ndbm library, as a side effect of various simplifications to
- the original Larson algorithm. It does produce holes in the
- page file as it writes pages past the end of file. (Larson's
- paper include a clever solution to this problem that is a
- result of using the hash value directly as a block address.)
- On the other hand, extensive tests seem to indicate that
- sdbm creates fewer holes in general, and the resulting page-
- files are smaller. The sdbm implementation is also faster
- than ndbm in database creation. Unlike the ndbm, the sdbm
- _________________________
-
- [1] UN*X is not a trademark of any (dis)organization.
-
-
-
-
-
-
-
-
-
- - 2 -
-
-
- store operation will not ``wander away'' trying to split its
- data pages to insert a datum that cannot (due to elaborate
- worst-case situations) be inserted. (It will fail after a
- pre-defined number of attempts.)
-
- Important Compatibility Warning
-
- The sdbm and ndbm libraries cannot share databases: one
- cannot read the (dir/pag) database created by the other.
- This is due to the differences between the ndbm and sdbm
- algorithms[2], and the hash functions used. It is easy to
- convert between the dbm/ndbm databases and sdbm by ignoring
- the index completely: see dbd, dbu etc.
-
-
- Notice of Intellectual Property
-
- The entire sdbm library package, as authored by me, Ozan S.
- Yigit, is hereby placed in the public domain. As such, the
- author is not responsible for the consequences of use of
- this software, no matter how awful, even if they arise from
- defects in it. There is no expressed or implied warranty for
- the sdbm library.
-
- Since the sdbm library package is in the public domain,
- this original release or any additional public-domain
- releases of the modified original cannot possibly (by defin-
- ition) be withheld from you. Also by definition, You (singu-
- lar) have all the rights to this code (including the right
- to sell without permission, the right to hoard[3] and the
- right to do other icky things as you see fit) but those
- rights are also granted to everyone else.
-
- Please note that all previous distributions of this
- software contained a copyright (which is now dropped) to
- protect its origins and its current public domain status
- against any possible claims and/or challenges.
-
- Acknowledgments
-
- Many people have been very helpful and supportive. A
- partial list would necessarily include Rayan Zacherissen
- (who contributed the man page, and also hacked a MMAP
- _________________________
-
- [2] Torek's discussion [Tor87] indicates that
- dbm/ndbm implementations use the hash value to traverse
- the radix trie differently than sdbm and as a result,
- the page indexes are generated in different order. For
- more information, send e-mail to the author.
- [3] You cannot really hoard something that is avail-
- able to the public at large, but try if it makes you
- feel any better.
-
-
-
-
-
-
-
-
-
-
- - 3 -
-
-
- version of sdbm), Arnold Robbins, Chris Lewis, Bill David-
- sen, Henry Spencer, Geoff Collyer, Rich Salz (who got me
- started in the first place), Johannes Ruschein (who did the
- minix port) and David Tilbrook. I thank you all.
-
- Distribution Manifest and Notes
-
- This distribution of sdbm includes (at least) the following:
-
- CHANGES change log
- README this file.
- biblio a small bibliography on external hashing
- dba.c a crude (n/s)dbm page file analyzer
- dbd.c a crude (n/s)dbm page file dumper (for conversion)
- dbe.1 man page for dbe.c
- dbe.c Janick's database editor
- dbm.c a dbm library emulation wrapper for ndbm/sdbm
- dbm.h header file for the above
- dbu.c a crude db management utility
- hash.c hashing function
- makefile guess.
- pair.c page-level routines (posted earlier)
- pair.h header file for the above
- readme.ms troff source for the README file
- sdbm.3 man page
- sdbm.c the real thing
- sdbm.h header file for the above
- tune.h place for tuning & portability thingies
- util.c miscellaneous
-
- dbu is a simple database manipulation program[4] that
- tries to look like Bell Labs' cbt utility. It is currently
- incomplete in functionality. I use dbu to test out the rou-
- tines: it takes (from stdin) tab separated key/value pairs
- for commands like build or insert or takes keys for commands
- like delete or look.
-
- dbu <build|creat|look|insert|cat|delete> dbmfile
-
- dba is a crude analyzer of dbm/sdbm/ndbm page files. It
- scans the entire page file, reporting page level statistics,
- and totals at the end.
-
- dbd is a crude dump program for dbm/ndbm/sdbm data-
- bases. It ignores the bitmap, and dumps the data pages in
- sequence. It can be used to create input for the dbu util-
- ity. Note that dbd will skip any NULLs in the key and data
- fields, thus is unsuitable to convert some peculiar
- _________________________
-
- [4] The dbd, dba, dbu utilities are quick hacks and
- are not fit for production use. They were developed
- late one night, just to test out sdbm, and convert some
- databases.
-
-
-
-
-
-
-
-
-
- - 4 -
-
-
- databases that insist in including the terminating null.
-
- I have also included a copy of the dbe (ndbm DataBase
- Editor) by Janick Bergeron [janick@bnr.ca] for your pleas-
- ure. You may find it more useful than the little dbu util-
- ity.
-
- dbm.[ch] is a dbm library emulation on top of ndbm (and
- hence suitable for sdbm). Written by Robert Elz.
-
- The sdbm library has been around in beta test for quite
- a long time, and from whatever little feedback I received
- (maybe no news is good news), I believe it has been func-
- tioning without any significant problems. I would, of
- course, appreciate all fixes and/or improvements. Portabil-
- ity enhancements would especially be useful.
-
- Implementation Issues
-
- Hash functions: The algorithm behind sdbm implementa-
- tion needs a good bit-scrambling hash function to be effec-
- tive. I ran into a set of constants for a simple hash func-
- tion that seem to help sdbm perform better than ndbm for
- various inputs:
-
- /*
- * polynomial conversion ignoring overflows
- * 65599 nice. 65587 even better.
- */
- long
- dbm_hash(char *str, int len) {
- register unsigned long n = 0;
-
- while (len--)
- n = n * 65599 + *str++;
- return n;
- }
-
- There may be better hash functions for the purposes of
- dynamic hashing. Try your favorite, and check the pagefile.
- If it contains too many pages with too many holes, (in rela-
- tion to this one for example) or if sdbm simply stops work-
- ing (fails after SPLTMAX attempts to split) when you feed
- your NEWS history file to it, you probably do not have a
- good hashing function. If you do better (for different
- types of input), I would like to know about the function you
- use.
-
- Block sizes: It seems (from various tests on a few
- machines) that a page file block size PBLKSIZ of 1024 is by
- far the best for performance, but this also happens to limit
- the size of a key/value pair. Depending on your needs, you
- may wish to increase the page size, and also adjust PAIRMAX
- (the maximum size of a key/value pair allowed: should always
-
-
-
-
-
-
-
-
-
- - 5 -
-
-
- be at least three words smaller than PBLKSIZ.) accordingly.
- The system-wide version of the library should probably be
- configured with 1024 (distribution default), as this appears
- to be sufficient for most common uses of sdbm.
-
- Portability
-
- This package has been tested in many different UN*Xes
- even including minix, and appears to be reasonably portable.
- This does not mean it will port easily to non-UN*X systems.
-
- Notes and Miscellaneous
-
- The sdbm is not a very complicated package, at least
- not after you familiarize yourself with the literature on
- external hashing. There are other interesting algorithms in
- existence that ensure (approximately) single-read access to
- a data value associated with any key. These are directory-
- less schemes such as linear hashing [Lit80] (+ Larson varia-
- tions), spiral storage [Mar79] or directory schemes such as
- extensible hashing [Fag79] by Fagin et al. I do hope these
- sources provide a reasonable playground for experimentation
- with other algorithms. See the June 1988 issue of ACM Com-
- puting Surveys [Enb88] for an excellent overview of the
- field.
-
- References
-
-
- [Lar78]
- P.-A. Larson, ``Dynamic Hashing'', BIT, vol. 18, pp.
- 184-201, 1978.
-
- [Tho90]
- Ken Thompson, private communication, Nov. 1990
-
- [Lit80]
- W. Litwin, `` Linear Hashing: A new tool for file and
- table addressing'', Proceedings of the 6th Conference on
- Very Large Dabatases (Montreal), pp. 212-223, Very
- Large Database Foundation, Saratoga, Calif., 1980.
-
- [Fag79]
- R. Fagin, J. Nievergelt, N. Pippinger, and H. R.
- Strong, ``Extendible Hashing - A Fast Access Method for
- Dynamic Files'', ACM Trans. Database Syst., vol. 4,
- no.3, pp. 315-344, Sept. 1979.
-
- [Wal84]
- Rich Wales, ``Discussion of "dbm" data base system'',
- USENET newsgroup unix.wizards, Jan. 1984.
-
- [Tor87]
- Chris Torek, ``Re: dbm.a and ndbm.a archives'',
-
-
-
-
-
-
-
-
-
- - 6 -
-
-
- USENET newsgroup comp.unix, 1987.
-
- [Mar79]
- G. N. Martin, ``Spiral Storage: Incrementally Augment-
- able Hash Addressed Storage'', Technical Report #27,
- University of Varwick, Coventry, U.K., 1979.
-
- [Enb88]
- R. J. Enbody and H. C. Du, ``Dynamic Hashing
- Schemes'',ACM Computing Surveys, vol. 20, no. 2, pp.
- 85-113, June 1988.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-