NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / comp / compress / 2809 < prev next >

Wrap

Internet Message Format | 1992-07-26 | 3.1 KB

Path: sparky!uunet!darwin.sura.net!mips!mips!public!rem From: rem@public.BTR.COM (Robert E. Maas rem@btr.com) Newsgroups: comp.compression Subject: Re: digraph compression Message-ID: <7511@public.BTR.COM> Date: 26 Jul 92 20:51:21 GMT References: <1992Jul22.230017.1136@athena.mit.edu> Organization: BTR Public Access UNIX, MtnView CA. Contact: Customer Service cs@BTR.COM Lines: 43 <<I would like to compress a directed graph, but still be able to determine whether one node points to another directly from the compressed file. Any suggestions?>> Before your question can be answered, a lot of ambiguities need to be clarified: Do you have ONLY a set of nodes and directional links, or are there labels on the nodes and/or the links? Are the labels strings of text or numbers (integers)? Is there any other information associated with the nodes or links? (I.e. is this a database with links between various records, or just a mathematical graph with no information content other than its structure?) If the nodes have no labels whatsoever, not even numbers, how do you ask questions about whether one node links to another or not? What sorts of questions about links do you specifically need to answer efficiently? (Given x and y, is there a link from x to y; Given x, what are all y such that there's a link from x to y; Given y, what are all x such that there's a link from x to y.) Depending on which kinds of questions you want to answer, and what the labels on nodes are, various different kinds of hash tables may be suitable for quickly looking up the answers to those questions. How many nodes and links are there altogether in your directed graph? What is the maximum number of characters in a label? What is the average and maximum amount of data associated with any node? Maybe if you have a mathematical graph with no labels and no information content, maybe you just want a way to abstract a given graph down to the minimum by which you can compare it to another, in which case hashing the whole graph down to a single 32-bit number would suffice. (But that doesn't sound like what you want here, right?) If your graph has labels on nodes, and a large number of nodes, and only a small fraction of possible links exist, and you want to answer only YES/NO questions whether x->y is a link, a double-hashing on the two keys with a simple yes/no answer would suffice. The most efficient way storagewise to do this may be to have a bitmap which is by default all zeroes but where each x->y link causes several different pseudo-random bits to be turned to 1 based on the hash number that was computed. Then to find if a given link appears you check ALL the bits associated with that x->y link and if ALL are on then the link probably exists while if not all are on then the link definitely doesn't exist. If the graph is static, you can compile it once, then run a background process that generates all possible x->y pairs to see if there are any false matches, and collect a list of those exceptions. Then discard the original data (but keep a backup somewhere of course) and use the bitmap plus the exception list to answer all x->y YES/NO queries.