home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.compression
- Path: sparky!uunet!stanford.edu!agate!rsoft!mindlink!a7657
- From: a7657@mindlink.bc.ca (Stephen H. Kawamoto)
- Subject: Re: compression of small files
- Organization: MIND LINK! - British Columbia, Canada
- Date: Thu, 12 Nov 1992 23:20:45 GMT
- Message-ID: <17407@mindlink.bc.ca>
- Sender: news@deep.rsoft.bc.ca (Usenet)
- Lines: 63
-
- > work well: means a good compression ratio, speed is unimportant.
- >
- > small files: means 1024 bytes or less.
- >
- > Or would a file of such a size be too small to effectively compress
- > ?
- Would this be repetitive data as in small database or spreadsheet files,
- or text? The minimal size would still have to be greater than 14 bytes.
- (My t{sts seem to indicate that the minimal size for a file with repetitive
- data to effectively compress is about 14 bytes for LHA and ARJ but 7 bytes
- for PKZIP.)
-
- What this means is that for the source algorithms for compression within
- these compression utilities, the algorithms in LHA and ARJ are probably the
- same with minute variations and the smallest size that a stream of characters
- can be represented is 13 bytes. On the other hand, PKZIP uses a small stream
- of characters size, and the smallest size that can represent a group of data
- characters is 6 bytes.
-
- Of course the overhead of these methods leaves up with large PKZIP files due
- to the large header, and smaller ARJ and LHA files even when the minimum file
- size is less than the minimal (less than 14 bytes).
-
- Thus it stand to reason that there has to be a minimal file size that doesnt
- result in an INCREASE in file size for the resulting compressed file. For
- LHA, that is actually about 42 bytes; for ARJ, 110; and for PKZIP, about 120.
- (This information is based on tests using a test file with 8 bytes of data
- created by ECHO AAAA>test.txt and compressing the resulting file by each of
- the utilities, LHA, ARJ and PKZIP. The resulting filesizes were used to make
- a determination of the actually minimal file size. Note that perhaps a file
- with a size 6 bytes less than given above might be the absolute minimal. What
- this means is LHA has a 36 byte header for a 8 character filename with 6
- bytes in it. ARJ, a 104 byte header. And PKZIP, a 114 byte header. Filesizes
- also vary according to length of filename and the inclusion|exclusion of
- pathnames into the directory structure in the resulting compressed file.)
-
- Of course, there are probably documents with better scientific method than
- the ones I've employed (minimal ones in my case). The smallest size of a file
- does depend on the filename length, inclusion|exclusion of pathnames as well
- as the size of the file itself.
-
- So the answer to your question: Or would a file of such a size be too small
- to effectively compress ?
-
- NO.
-
-
-
- --
- PGP 384/D7484F Stephen Kawamoto
-
- a7657@mindlink.bc.ca.
-
-
- UUENCODED on MIND LINK! Fri Oct 30 10:35:08 1992
-
-
- begin 666 sign1a.pcx
- M"@4!`0`````K`3L`+`$\`````/___P``````````````````````````````
- M```````````````````````````!)@``````````````````````````````
- M``````````````````````````````````````````````````#E_\'PY?_!
- M\.7_P?#E_\'PY?_!\.7_P?#$_\'/X/_!\,3_P<_@_\'PQ/_!Q\W_/]+_P?#$
- M_\''S?\
-