home *** CD-ROM | disk | FTP | other *** search
-
-
- Run Length Encoding compressor program 8 bit header version
-
-
-
- Written by Shaun Case 1991 in Borland C++ 2.0
-
- with sizeof (int) == 2
-
-
-
- This program and its source code are Public Domain.
-
- This program should be portable to any machine with
-
- 2 byte short ints and 8 bit bytes, if you patch the
-
- filename stuff, which is ms-dos specific.
-
-
-
-
-
- What is run length encoding?
-
-
-
- Run Length Encoding, also known as RLE, is a method of compressing data
-
- that has a lot of "runs" of bytes (or bits) in it. A "run" is a series
-
- of bytes that are all the same. For instance, the string "THIS IS A
-
- VEEEEEEEEEEEEEEEEEEEEEEEERY INTERESTING SENTENCE" has a run of 23 'E's
-
- in it. This could be compressed in the following manner:
-
-
-
- THIS IS A V23ERY INTERESTING SENTENCE
-
-
-
- resulting in a savings of 20 characters. A further savings of one
-
- character can be realized if the sequence "23" is replaced by a single
-
- byte with the value 23.
-
-
-
- However, if the text to be encoded is arbitrary, then it may contain
-
- numbers as well as letters, and bytes of all possible values. For this
-
- reason, there must be some way to let the decoder know when a compressed
-
- run is encountered, and when a sequence to be passed straight through is
-
- encountered. For this reason, the following file format was used:
-
-
-
-
-
- ========= tech info =========
-
-
-
- 8 bit header version.
-
-
-
- File format:
-
-
-
- 13 byte original filename, followed by
-
-
-
- [ 8 bit header + data ][ 8 bit header + data ][ 8 bit header + data ]
-
- etc..
-
-
-
- header:
-
-
-
- bit 7 : 1 if following byte is a run
-
- bit 6 - 0 : legnth of run (max 127, min 3)
-
-
-
- data: 1 byte : which character run consists of
-
-
-
- *** OR ***
-
-
-
- header:
-
-
-
- bit 7 : 0 if following bytes are sequence
-
- bit 6 - 0 : legnth of sequence (max 127)
-
-
-
- data: (header AND 0x7F) bytes of data
-
- : data bytes copied to output stream unchanged
-
-
-
- ===============================
-
-
-
- bugs:
-
-
-
- None known
-
-
-
-
-
- Nasty features :
-
-
-
- 1) When encoder reaches max run length, it is written
-
- out correctly, but is followed by a 1 length run of
-
- the next byte. Odd. Reason unknown.
-
-
-
- 2) Better compression could be achieved by having min
-
- compression length and sequence length understood
-
- to be 2. This would allow an "understood" multiplication
-
- of the seq_len or run_len by 2, since 1 is never used,
-
- allowing sequences of 254 bytes. This is not likely
-
- to give much better compression in most cases,
-
- and is left as an exercise for the reader.
-
-
-
- Implementing this requires fixing 1 above, too.
-
-
-
-
-
-
-
-
-
- Author: atman%ecst.csuchico.edu@RELAY.CS.NET (internet)
-
- 1@9651 (WWIVnet)
-
- atman of 1:119/666.0 (fidonet)
-
-
-
-
-
- Tell me hi if you use this program!
-