NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / comp / compress / 2826 < prev next >

Wrap

Internet Message Format | 1992-07-27 | 5.4 KB

Path: sparky!uunet!mcsun!uknet!acorn!armltd!dseal From: dseal@armltd.co.uk (David Seal) Newsgroups: comp.compression Subject: Re: PERPETUAL MOTION Message-ID: <4675@armltd.uucp> Date: 27 Jul 92 14:49:16 GMT References: <62692@cup.portal.com> Sender: dseal@armltd.uucp Distribution: comp Organization: A.R.M. Ltd, Swaffham Bulbeck, Cambs, UK Lines: 108 In article <62692@cup.portal.com> ts@cup.portal.com (Tim W Smith) writes: >Most impossible compressors claim to only work on files past a certain >size. Let that size be M bits. Let F(X) be the total number of files >of X bits or less. Let f(X) be the total number of files of X bits. > >Assume we have some ordering of all files. With each file, associate >a number, which is the position of the file in this ordering relative >to the first file of the same size. > >Now consider the following compressor. For files of length M+1, the >first F(M) of them are mapped to files of length M or less. The rest >of the files of length M+1 are mapped to the files whose number is >F(M) less than theirs. > >This leaves us F(M) files of length M+1 that do not have files mapped >onto them. > >We can now do a similar thing for files of length M+2. ... > >Repeat forever. At each step, there are F(M) files that compress, and >applying this to a file of size N at most f(N)/F(M) times will compress >it. > >I don't recall all the WEB claims, but I think that this satisfies them >in spirit. > >... > >However, I believe that it *DOES* provide a counterexample to the >usual counting argument. It does *NOT* provide such a counterexample. The counting argument *only* applies to lossless compression methods. Your methods are lossy: the counting argument simply doesn't apply to them, and so they cannot be counterexamples to the counting argument. To see why your methods are lossy: A single application of the method outlined above is lossy because it has got to do *something* with input files of length M or less. The claims about always compressing only apply to files of length greater than M, so such input files don't need to be compressed: in fact, they may be expanded drastically if you want. But they do have to map to *some* output file. But the method uses all possible output files to represent the input files of length greater than M: whatever output file you choose for any input file of length M or less has already been used for a file of length greater than M. So the method maps two distinct input files to the same output file, and so is lossy. If you like, you can refuse to apply the method to input files of length M or less. But if you're then given a "compressed" file of length M or less, you cannot tell whether the original file was identical to the output file or was the file of length M+1 that compressed down to the output file. Again, the method is lossy. You can get around this by adding an extra bit to each output file, to say whether it needs to be decompressed or is a verbatim length M or less file. This is lossless, but unfortunately this extra bit has more than negated all the compression you've achieved... Now to deal with the "multiple applications, until the file is length M or less" technique: the length M or less output file is not enough to reconstruct the input file, so the technique is lossy. To make it lossless, you also need to record the number of times that you need to apply the decompression algorithm to it. Unfortunately, this number is liable to be rather big: so big in fact that any technique for adding some representation of it to the file will either result in some files having been increased in size or in all files being restored to their original size... To summarise: * The counting argument applies only to lossless techniques. * A compression technique must specify what it does to all possible input files - even if you're only interested in what it does to files greater than a certain length. * A compression technique is lossless if its output file *always* contains enough information to reproduce the input file (using a fixed decompression algorithm and without relying on any information surreptitiously hidden elsewhere than in the output file). It is lossy otherwise: in particular, if two distinct input files produce the same output file, it is lossy. * For a "apply compression algorithm X until condition C is true" compression technique to be lossless, the decompression program must not merely be capable of applying decompression algorithm X; it must also be capable of establishing *how many times* to apply decompression algorithm X. A general suggestion: if you seriously think you've found a counterexample to the counting argument, may I suggest you follow it through to its logical conclusion? Such a counterexample would result in a proof that two distinct integers are equal to each other. If you've got a valid proof of this form, you could: (1) Make vast amounts of money by borrowing the smaller of these two integers, observing that what you have got is equal to the larger, paying back the loan out of the larger, and pocketing the difference... (2) Become famous as the mathematician who established the inconsistency of practically all known mathematics... Doing either of these successfully would be far more profitable to you personally than simply posting to Usenet :-) :-) :-) David Seal dseal@armltd.co.uk All opinions are mine only...