Compression - Creating Reasonable Size Files for Your Images

Digital image files contain discrete information about every single pixel. Considering each pixel generally requires 24 bits (or more) of information to determine color. It's easy to see how image files can quickly reach very large sizes. Such large files pose two basic problems:

  1. Memory and disk storage are used up quickly, either limiting the number of pictures you can download to your hard drive, or requiring you to have more RAM memory in your computer to process them.
  2. Electronic transmission, like downloading images from the web or emailing them to friends, is much slower.

   

JPEG photograph

Fortunately, large digital image files can be made a more reasonable size through a process known as compression.

We will examine the two main types of compression, lossless and lossy, and the major file types associated with each (the GIF and JPEG formats).

The GIF - A Lossless Compression

Lossless compression is the elementary form of compression, yet its principles hold true for all compression systems. It's referred to as lossless because the image file size is reduced without throwing out any information in the process.

Imagine a picture of a friend in front of a plain white wall. The top 25 percent of the picture is simply the pure white of the wall with no variations. If the image contains one million pixels, 250,000 adjacent pixels would be identical. An uncompressed file would account for all 250,000 white pixels separately. A lossless compression format would recognize this long string of identical values and store it in an abbreviated form known as a run. In this way, it's possible to significantly reduce the file size without sacrificing any of the original information.

The most widely used lossless compression system is the Graphics Interchange Format (GIF). GIFs operate on a very simple principle, when a horizontal string of pixels is the same color, the GIF format replaces the individual pixels with a number indicating the length of the string, coupled with information about its color.

GIFs work great on any image that has large blocks of a single color, such as line drawings, simple graphic designs, and certain kinds of text. The illustration on the left and the drawing on the right are two good examples of typical GIF files.

example of gif image

example of gif image

The lossless format employed in GIFs doesn't work well on most photographs. Why? Because photographs can contain thousands of discrete colors all mixed together, with far fewer blocks of a single color. For example, you might describe an apple as red, but a digital photograph reveals hundreds of different shades of red across the surface of the apple.

Compare the relatively complexity of the two images below. A GIF format would work for the image on the left, but would not be as effective for the image on the right:

good example for gif file

good example for jpeg compression

The JPEG - Intelligent Compression

A consortium of computer and photography experts known as the Joint Photographic Experts Group (JPEG) solved the problem of photographic compression by creating the JPEG form of compression. The JPEG has become the industry standard for digital photography. While the GIF format is lossless, the JPEG format is a lossy form of compression. That is, some of the original information is permanently discarded in the compression process.

In a perfect world, you wouldn't want to discard any information, but the JPEG format is a viable compromise. This format relies on a series of complex mathematical equations, known as algorithms, which can reduce a file size by a 10:1 ratio without significantly degrading the image. A 500K image can be reduced to 50K, with the changes visible only upon very close inspection.

The JPEG format is extremely flexible, allowing you to compress a file at any number of different levels. You can save a file at anywhere between a 1:1 ratio all the way down to a 100:1 ratio. There are always trade-offs involved however, higher quality images mean bigger files, and smaller files mean lower quality images.

Intelligent Loss

While the JPEG operates on a similar principle as the GIF, the JPEG adds another step, it actually removes small variations in color, and thus simplifies the overall image compression. A loss of detail can compromise image quality, but the changes can be virtually invisible depending on the information removed. JPEGs rely on two techniques to achieve compression.

Reducing Variations in Hue

The human eye is much more sensitive to changes in brightness than changes in hue. Our eyes can quickly pick up the difference between two reds if one is slightly darker than the other. But they may not detect a difference if the colors are equally bright. For this reason the JPEG format sacrifices small variations in hue, while keeping as much information as possible on brightness.

The large image below, and to the left, is a JPEG with low compression. The top right image is a close-up of the same image, and the bottom right image is a closeup of the same image with more compression. The difference in quality between high and low compression levels is clear when comparing the images:

JPEG image

low and high compression jpeg xoom

Interpolation

With a GIF, decompression is a straightforward process. Every pixel is accounted for, so all you have to do is unpack the runs. JPEGs, on the other hand, will actually create pixels through a process called interpolation.

During the JPEG compression process, information about a number of adjacent pixels might be averaged together to create a single value, thus forming a run that can be stored more efficiently. When you open the file, the JPEG format recreates those changes by estimating, or interpolating, the original variations. For example, if the compressed file contains a medium red run right next to an orange run, it will create a border area of reddish-orange, so that one of the colors blends in smoothly with the next. For this reason the JPEG format works poorly with text, and other images with sharp gradation, since it creates a fuzzy border between two otherwise distinct colors.

good example where blending is ok

example where blending adds colors not in original

The Compression Trade-off

Ideally, we would have computers with large storage hard drivers and fast modems, so compression wouldn't be necessary. But given current limitations, and the vast amount of information that even a single digital photograph can contain, compression is unavoidable. GIF image files have the advantage of lossless compression, which is ideal.

As you've seen, GIFs really aren't the best, or most appropriate format, for digital photographs. So we must turn to a lossy type of compression. The trick with lossy compression formats, such as the JPEG, is to find the right balance between file size (which we always want to be as small as possible) and definition (which we always want to be as high as possible).

What you want to do with your images will determine the appropriate amount of compression to apply. The JPEG file format allows you to achieve sufficient size reduction while retaining acceptable image quality.