home *** CD-ROM | disk | FTP | other *** search
- I've seen quite a few messages in this text compression thread but for
- some reason I missed the original message. I have a couple of simple
- RLE routines to pass along which can be very effective in certain
- rather limited circumstances. I've seen mention of a bit packing
- scheme that you seem to be considering for compression purposes. You
- should check out the books PROGRAMMING PEARLS and MORE PROGRAMMING
- PEARLS by Jon Bently. One of these books contains a discussion of bit
- packing, and both contain a great many ideas worth thinking about.
-
- Program Name RLE.BAS
- Author Kenneth G. White
- Date 9/3/90
- Language QuickBasic 4.x / PDS 7.x
- Purpose RLE Data Compression Functions
-
- I recently had a chance to write a couple of Run Length Encoding
- data compression functions, and I thought they might be of use or
- amusement to someone else. This compression method is among the
- most simple, and works by reducing redundant character sequences to
- a single character code and a character count code. As a result,
- any file with a large number of repeating characters can be reduced
- in size by a meaningful amount. The downside is that you must
- uncompress your data before use, and, if there are no repeating
- character sequences, the output file could be double the size of
- the input file. This makes your compression utility into an
- expansion utility. Fun, but not very useful.
-
- The functions in RLE.BAS are not designed to be a general purpose
- compression tool, so keep your current utilities at the ready.
- There are however, situations that come up in programming where
- some amount of data compression could be useful. Character based
- screen displays and other forms of text loaded separately into a
- program come to mind as possible candidates for this type of
- compression. The main limiting factor in using compressed data
- within a program is the speed with which it can be uncompressed.
- I've tried to make these routines as fast as possible within the
- limits of Basic.
-
- The functions are designed to deal with character codes 0-255 as
- input which makes them very flexible but slows them down a bit.
- This increased flexibility can also carry a high price in terms of
- compressed output size. A compression marker is used to signal the
- start of a code/count sequence in the compressed output, and this
- code requires special handling if it is allowed in the input
- stream. Encoding an input stream in which every other character is
- a compression marker will cause the output size to be double the
- input size. A file which contains no compression markers, or which
- has sequences of three or more compression markers at a time, has a
- worst case output size equal to the input size. In certain cases,
- changing the compression marker code can have a very large effect
- on the output size. Additionally, if you are sure that the input
- data will never include this marker character, then the special
- processing code used to deal with this possibility can be removed.
- Allowing the entire character set as input seemed to be a fair
- trade-off in terms of flexibility, but this is just a personal
- preference on my part.
-
- I tried these functions on various text files which contained
- formatted documents ,BBS messages, text screens, and random
- character sequences. The compression varied between 8% and 65%,
- with the best compression coming from ascii text screens and
- formatted documents. I would think somewhere between 10% and 20%
- might be about the best you could hope for in most ascii files.
- Interestingly, while BBS message files and the like, were
- compressed in the neighborhood or 10-15%, a text file containing NL
- player stats was reduced in size by 51%. Formatted program screens
- consisting of standard text and border characters can generally be
- compressed by as much as 65% in my tests. This seems to the most
- practical area for application of these functions.
-
-