This document is an introduction to the mp3 format and an explanation of the technology behind lossy compression in mp3 files.
The target of audio compression is to reduce the space on digital media needed to store audio. On a CD, one second of audio needs (44100 Hz samples per second * 2 channels * 16 bits per sample) 176 kb space. When you want to send an uncompressed song over the internet (say, with a 56k modem), this would take fairly long (and for the impatient, almost forever).
There are some methods to compress audio data. Lossless compression methods compact the data in such a way that they can be restored back bit-identical to the original. That is what e.g. the zip format does (even when it wasn't designed for that). There are also lossy compression methods. These take advantage of signals in the audio that cannot be heard by humans, and thus safely be discarded. When these are reconstructed, the unhearable signals are left out of the signal, but because we don't hear them, they aren't noticable.
Unfortunately, lossy encoding methods can leave slight distortions in the audio signal which sometimes can be hearable, mostly by people with "golden ears". The loss of hearable signals can be kept in a limit, though.
Technically, audio information in the mp3 file is stored in frames. mp3 frames are a set of bytes that describes audio information. Frames can have different bitrates, that means they have different lengths. Although, every mp3 frame decodes to 1152 audio samples (for MPEG-1 Layer III). With a sample rate of 44100 Hz, a frame holds 26.1 milliseconds audio data. When using variable bitrate, the encoder can decide how much bits to spend on more complex parts of the audio to compress.
The audio data to compress is transformed into frequency domain with a method called MDCT (modified discrete cosinus transformation). From there, the encoder judges (by using a specific psychoacoustic model) how the transformed spectral values are stored within a frame. The psychoacoustic model determines which values are needed for hearing, and which one can be dropped, because they would be inaudible for the human ear. By optimizing the psychoacoustic model used, quality can be increased, even when using the same technology.
There are different layers in the MPEG audio standard that describes slightly different frame formats and sample rates. Layer I is the simplest layer, whereas Layer III has the most complexity. Because of this, Layer III is mostly used and is called mp3 as an abbreviation.
The mp3 standard, also known as MPEG-1 Layer III, was first developed by the Fraunhofer Institute for Integrated Circuits (FhG IIS-A), in 1987. It was primarily developed for digital audio broadcasting (DAB). Two years later, FhG patented the technologies behind the format. Later it was submitted to the ISO (International Standards Organization) and added to the MPEG-1 video standard as audio compression scheme.
A nice explanation of the mp3 format can also be found here: Howstuffworks: how mp3 files work.