CD Actual 15

home *** CD-ROM | disk | FTP | other *** search

/ CD Actual 15 / CDACTUAL15.iso / cdactual / program / basic / QB_RLE.ZIP / RLE.DOC < prev

Wrap

Text File | 1991-03-07 | 4.1 KB | 72 lines

I've seen quite a few messages in this text compression thread but for some reason I missed the original message. I have a couple of simple RLE routines to pass along which can be very effective in certain rather limited circumstances. I've seen mention of a bit packing scheme that you seem to be considering for compression purposes. You should check out the books PROGRAMMING PEARLS and MORE PROGRAMMING PEARLS by Jon Bently. One of these books contains a discussion of bit packing, and both contain a great many ideas worth thinking about. Program Name RLE.BAS Author Kenneth G. White Date 9/3/90 Language QuickBasic 4.x / PDS 7.x Purpose RLE Data Compression Functions I recently had a chance to write a couple of Run Length Encoding data compression functions, and I thought they might be of use or amusement to someone else. This compression method is among the most simple, and works by reducing redundant character sequences to a single character code and a character count code. As a result, any file with a large number of repeating characters can be reduced in size by a meaningful amount. The downside is that you must uncompress your data before use, and, if there are no repeating character sequences, the output file could be double the size of the input file. This makes your compression utility into an expansion utility. Fun, but not very useful. The functions in RLE.BAS are not designed to be a general purpose compression tool, so keep your current utilities at the ready. There are however, situations that come up in programming where some amount of data compression could be useful. Character based screen displays and other forms of text loaded separately into a program come to mind as possible candidates for this type of compression. The main limiting factor in using compressed data within a program is the speed with which it can be uncompressed. I've tried to make these routines as fast as possible within the limits of Basic. The functions are designed to deal with character codes 0-255 as input which makes them very flexible but slows them down a bit. This increased flexibility can also carry a high price in terms of compressed output size. A compression marker is used to signal the start of a code/count sequence in the compressed output, and this code requires special handling if it is allowed in the input stream. Encoding an input stream in which every other character is a compression marker will cause the output size to be double the input size. A file which contains no compression markers, or which has sequences of three or more compression markers at a time, has a worst case output size equal to the input size. In certain cases, changing the compression marker code can have a very large effect on the output size. Additionally, if you are sure that the input data will never include this marker character, then the special processing code used to deal with this possibility can be removed. Allowing the entire character set as input seemed to be a fair trade-off in terms of flexibility, but this is just a personal preference on my part. I tried these functions on various text files which contained formatted documents ,BBS messages, text screens, and random character sequences. The compression varied between 8% and 65%, with the best compression coming from ascii text screens and formatted documents. I would think somewhere between 10% and 20% might be about the best you could hope for in most ascii files. Interestingly, while BBS message files and the like, were compressed in the neighborhood or 10-15%, a text file containing NL player stats was reduced in size by 51%. Formatted program screens consisting of standard text and border characters can generally be compressed by as much as 65% in my tests. This seems to the most practical area for application of these functions.