World of A1200

home *** CD-ROM | disk | FTP | other *** search

/ World of A1200 / World_Of_A1200.iso / programs / compress / misc / xpk / docs / huff.doc < prev next >

Wrap

Text File | 1995-02-27 | 10KB | 222 lines

HUFF A dynamic huffman cruncher/decruncher Version 0.62 Copyright 1992 by M.Zimmermann License/Disclaimer ------------------ This library may be freely distributed with the XPK compression package, as long as it is kept in its original, complete, and unmodified form. It may not be distributed by itself or in a commercial package of any kind without my permission. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. About Huffmans -------------- The idea of a huffman crunch is as follows: often used bytes (ie 8 bit codes) get a shorter code than 8 bits, fi 5 bits. So everytime one of these bytes occurs in the source file I save (in this example) 3 bits in the dest file. To find out the optimum codes for each byte there is a simple method: A tree is to be constructed so that the path from the root to each leaf reflects the optimum (ie huffman) code. Unfortunately most computers (the Amiga, too) are byte-oriented, which means a rather complex handling of codes that are not a multiple of 8 bits. This results in pretty slow compression & decompression. So this means that the xpkHUFF.library probably won't be used for normal circumstances, but, as Dominik stated, it may serve well as an example library. There are three different huffman crunch algorithms known: · static compression/decompression · dynamic compression/decompression · adaptive compression/decompression What are the differences? The static huffman uses a fix table to represent each byte of the source. This, of course, makes sense only, if the structure of the data to be crunched is known. In this case (for instance crunching an english text) a fix table of codes is embedded in the code. Crunching other data than what the table was generated for will probably result in worse compression or even expansion. This is what a dynamic huffman is avoiding: it first creates a statistics table reflecting the frequency every byte occurs with and generates a special tree/table for this case, so the dynamic huffman does a good compression for this special data. But there is something that can be improved, anyway: imagine, there is a data block which contains many 'a's in it's first part and many 'z's in the last part.... The dynamic huffman would represent the 'a's and 'z's with short codes, of course. But it probably would be even better if the crunch/decrunch tree would reflect the *current* data beeing processed instead of the whole block, thus in resulting shorter codes for 'a' and 'z', depending of the position in the data block. This is what an adaptive huffman deals with: it creates the tree while crunching, without doing any statistics or tree creation before crunching. It permanently updates it's internal huffman tree. Therefore it doesn't have to include the information about the decrunch tree in the crunched data. Final words about huffmans ... ------------------------------ A stand-alone huffman will never achieve crunch results as fine as those reached with most other crunchers, for these will not only regard the number of occurances for each byte (as huffman does), but sequels of byte, too. This means: If you create all permutations of a datablock, the huffman crunched will always have the same length. Others won't, as they are depending on the order of the crunched data, too. Description ----------- The library 'xpkHUFF.library' implements a dynamic huffman crunch algorithm, even though the adaptive might result in slightly better crunch results. However, this is more complex to implement and I'm using a maximum buffer size of 64K, so this is a little bit like an adaptive huffman for large files. If I should have lots of spare time I will probably implement an adaptive huffman crunch algorithm. This new library will be called xpkHUFF, too, and new xpkHUFF.libraries will always handle output generated by earlier versions. The xpkHUFF.library supports a totally unsafe (but a little bit better than simple eor :-) encryption. Please note that crunch/decrunch speeds decrease when encryption is used. The source ... -------------- The source is provided, so you might have a look at different decrunch codes. Please note: The only code I tested intensively is the byte/cache decrunch code, which was used to create the library included in this package. For it's the fasted code, you probably won't want to use another one. However, this might help you to understand what huffmans are about. If you have never written a huffman, have a look at it, a huffman code is a pretty good idea. If you *have* written a huffman code and it's *faster* than mine, please let me know, I will then update xpkHUFF.library. Implementation -------------- If you should see an errormessage saying output buffer too small while crunching *and* encrypting, this means you tried to crunch and encrypt a file that would crunched and encrypted be larger than the original file. This should occur only with very small files (for I have a minimum file size due to tables) or with files that have been crunched already and therefore would expand during crunch. A technical note: this could also happen, if the last chunk of a file to be crunched/encrypted would be dimensioned too small by xpkmaster.library. However, in this case you cannot encrypt the file. I know this could be annoying and will think about a solution for this problem, but remember: this encryption would not be safe, better if you used FEAL or IDEA for secure encryption. Statistics ---------- Following is a table briefly listing some comparative statistics for HUFF without encryption. These were generated by xBench on the standard XPK benchmark system (A3000/25 with SCRAM, using the AmigaVision executable as data). Note that memory needs don't include xpkmaster.library's buffers. Method Packing Unpacking Packing Unpacking Compression Memory Memory Speed Speed Ratio ------ ------- --------- ------- --------- ----------- HUFF 30K 71K 88 K/s 138 K/s 24.1% Where unpack speed varies depending on decrunch code (refer to source for that). Last words ... -------------- I tried hard to debug this library with range checking while writing bytes on crunching, and so on, but as in every code larger than, say 10 lines :-), there will be bugs. I don't know any bugs in this version, but if you should meet one, please let me know via email (refer to end of this document for my email adr). As usually, reproducable bugs are preferred. Please add your configuration, programs running (best if you try without startup-sequence!), and, most important of all, add the file you tried to crunch! Thank you. Version History --------------- ; V 0.1 - 12-Jul-1992 : first version ; V x.yy - 18-Jul-1992 : first OK version ; V x.yy - 19-Jul-1992 : sped up decrunching ; V x.yy - 21-Jul-1992 : bug fixed in word/long decrunching: min pack ; chunk size now 3/5 ; V x.yy - 21-Jul-1992 : replaced many subq/bxx with dbf (ie sped up ; crunching a little bit), bug fixed: there was ; a dbf counter wrong (one of my favorite 0/1 ; problem bugs) ; V 0.50 - 29-Jul-1992 : added 68030+ cache optimized decrunch code ; V 0.51 - 01-Aug-1992 : byte decrunch improved, first code added, ; indicator byte for crunchmethod used added, ; 68030+ chache optimized code does not make ; sense any more, since byte decrunch fits to ; cache completely, now ; V 0.52 - 01-Aug-1992 : unsafe encryption supported ; V 0.53 - 03-Aug-1992 : slight improvements made to crunch code ; (+ 6K/s) ; V 0.54 - 03-Aug-1992 : inconsistence in expansion handling fixed ; V 0.55 - 03-Aug-1992 : bug fixed: expansion handling now considers ; table creation, too ; V 0.56 - 03-Aug-1992 : bug fixed: HUFF now can crunch files ; consisting of always the same byte (shame ; on me, termination criterium was wrong) ; V 0.57 - 03-Aug-1992 : Tree creation code partially rewritten ; V 0.58 - 05-Aug-1992 : bug fixed: wrong termination criterium for ; expansion check (my favorite 0/1 problem) ; V 0.59 - 06-Aug-1992 : now decrypting in a special buffer, not using ; InBuf (this is read only, I was told) any more ; V 0.60 - 07-Aug-1992 : added extra memory required during ; packing/unpacking ; V 0.61 - 08-Aug-1992 : expansion check changed, renamed from ; xpkDHUF.library to xpkHUFF.library thus ; corresponding to the possibility of handling ; adaptive huffman codes later without having ; an inconsistence in the name ; V 0.62 - 10-Aug-1992 : Flag XPKIF_MODES removed (I don't have modes ; yet (but I have a mapping code :-=)) Contact Address --------------- zimmerma@helios.ibr.cs.tu-bs.de (M. Zimmermann)