home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!darwin.sura.net!mips!mips!public!rem
- From: rem@public.BTR.COM (Robert E. Maas rem@btr.com)
- Newsgroups: comp.compression
- Subject: Re: digraph compression
- Message-ID: <7511@public.BTR.COM>
- Date: 26 Jul 92 20:51:21 GMT
- References: <1992Jul22.230017.1136@athena.mit.edu>
- Organization: BTR Public Access UNIX, MtnView CA. Contact: Customer Service cs@BTR.COM
- Lines: 43
-
- <<I would like to compress a directed graph, but still be able to
- determine whether one node points to another directly from the
- compressed file. Any suggestions?>>
-
- Before your question can be answered, a lot of ambiguities need to be
- clarified: Do you have ONLY a set of nodes and directional links, or
- are there labels on the nodes and/or the links? Are the labels strings
- of text or numbers (integers)? Is there any other information
- associated with the nodes or links? (I.e. is this a database with links
- between various records, or just a mathematical graph with no
- information content other than its structure?) If the nodes have no
- labels whatsoever, not even numbers, how do you ask questions about
- whether one node links to another or not? What sorts of questions about
- links do you specifically need to answer efficiently? (Given x and y,
- is there a link from x to y; Given x, what are all y such that there's
- a link from x to y; Given y, what are all x such that there's a link
- from x to y.) Depending on which kinds of questions you want to answer,
- and what the labels on nodes are, various different kinds of hash
- tables may be suitable for quickly looking up the answers to those
- questions. How many nodes and links are there altogether in your
- directed graph? What is the maximum number of characters in a label?
- What is the average and maximum amount of data associated with any
- node? Maybe if you have a mathematical graph with no labels and no
- information content, maybe you just want a way to abstract a given
- graph down to the minimum by which you can compare it to another, in
- which case hashing the whole graph down to a single 32-bit number would
- suffice. (But that doesn't sound like what you want here, right?)
-
- If your graph has labels on nodes, and a large number of nodes, and
- only a small fraction of possible links exist, and you want to answer
- only YES/NO questions whether x->y is a link, a double-hashing on the
- two keys with a simple yes/no answer would suffice. The most efficient
- way storagewise to do this may be to have a bitmap which is by default
- all zeroes but where each x->y link causes several different
- pseudo-random bits to be turned to 1 based on the hash number that was
- computed. Then to find if a given link appears you check ALL the bits
- associated with that x->y link and if ALL are on then the link probably
- exists while if not all are on then the link definitely doesn't exist.
- If the graph is static, you can compile it once, then run a background
- process that generates all possible x->y pairs to see if there are any
- false matches, and collect a list of those exceptions. Then discard the
- original data (but keep a backup somewhere of course) and use the
- bitmap plus the exception list to answer all x->y YES/NO queries.
-