home *** CD-ROM | disk | FTP | other *** search
- Path: senator-bedfellow.mit.edu!bloom-beacon.mit.edu!news.mathworks.com!fu-berlin.de!cs.tu-berlin.de!phade
- From: phade@cs.tu-berlin.de (Frank Gadegast)
- Newsgroups: alt.answers,comp.answers,news.answers
- Subject: MPEG-FAQ: multimedia compression [1/9]
- Followup-To: alt.binaries.multimedia
- Date: 9 Nov 1996 09:32:20 GMT
- Organization: Technical University of Berlin, Germany
- Lines: 1304
- Approved: news-answers-request@MIT.EDU
- Expires: 31 Dec 1996 12:00:00 GMT
- Message-ID: <561j34$otq$1@news.cs.tu-berlin.de>
- Reply-To: mpegfaq@powerweb.de
- NNTP-Posting-Host: 130.149.22.20
- Mime-Version: 1.0
- Content-Type: text/plain; charset=iso-8859-1
- Content-Transfer-Encoding: 8bit
- Summary: This is the summary about the ISO video and audioformats MPEG 1, 2 and 4
- Keywords: MPEG, FAQ, Compression
- Xref: senator-bedfellow.mit.edu alt.answers:21694 comp.answers:22304 news.answers:86419
-
- Archive-name: mpeg-faq/part1
- Last-modified: 1996/06/02
- Version: v 4.1 96/06/02
- Posting-Frequency: bimonthly
-
- ===========================================================================
-
- ~Subject: SECTION 0. - INTRO
-
- ====================================================
- THE MPEG-FAQ [Version 4.1 - 1. June 1996]
- ====================================================
- PHADE Software
- Inh. Dipl-Inform. Frank Gadegast
- Leibnizstr. 30
- 10625 Berlin, GERMANY
-
- Fon/Fax ++ 49 30 3128103
- E-mail phade@powerweb.de
- Web site http://www.powerweb.de/mpeg
-
-
- It's the eights publication of this file. Lots of information has been
- changed (which has surely brought errors with it, see Murphy's Law).
-
- This eights compilation is very different to the previous one, Version 4.0.
-
- First: The location of this file is:
-
- Text-Version : URL: ftp://ftp.powerweb.de/mpeg/faq/mpegfa41.zip
- [194.77.15.46]
- HTML-Version : URL: http://www.powerweb.de/mpeg/faq/
-
- My MPEG-related software and my DOS-ports of several
- programs can be found there too.
-
- Second: "The Internet MPEG Audio Archive" is there ! Our brilliant
- collecting of everything that belongs to MPEG audio. For only
- DM 49,- ! Get it ! More than 400 MB of songs, documentation
- and utilities ! Read below, about how to Order !
-
- Third: "The Internet MPEG CD-Rom" is still available ! The uniq
- collecting of everything that belongs to MPEG. For only
- DM 49,90 ! Get it ! More than 600 MB of movies, songs,
- documentation and utilities ! Read below, about how to Order !
-
- Another CD-Rom containing material for MPEG-2 is about to get
- released ! It will be called the "MPEG-2 Movie Toolbox".
-
- Fourth: This FAQ has and the famous MPEG Archive has a complete new
- home now on the PowerWeb site ! The newest FAQ and other
- MPEG-related information and utilities for all platforms
- can always be loaded using WWW from:
-
- URL=http://www.powerweb.de/mpeg
-
- And surely, there are more interesting things to find ;o)
-
-
- I add my comments in brackets [], lines (---- or ====) seperate the
- chapters and questions.
-
- Please try and find out more information yourself. I had enough to do by
- getting and preparing this information. And only bother me with file-
- request if its not possible for you to get it somewhere else !!!
-
- If you want to contribute to this FAQ in any way, please email directly too
- (probably by replying to this posting):
-
- mpegfaq@powerweb.de
-
- If you want to contribute to the MPEG Archive, please upload via ftp to
- ftp://ftp.powerweb.de/incoming/mpeg and notity mpeg@powerweb.de via
- e-mail about your contribution.
-
- Other usefull information related to MPEG can be e-mailed to
-
- mpeg@powerweb.de
-
- Or send any additional information via fax or e-mail.
-
- Enjoy MPEG, KeyJ "MPEG" Phade (Frank Gadegast)
-
-
- -------------------------------------------------------------------------------
-
- ~Subject: Disclaimer
-
- I HAVE NOTHING TO DO WITH THE NAMED COMPANIES, NO BUSINESS,
- IT'S JUST MY PERSONAL INTERESTED. COMPANIES ARE NAMED,
- BECAUSE THEY ARE THE FIRST, BRINGING REAL MULTIMEDIA TO THE
- WORLD. SURE I MAKE ADVERTS FOR THEM WITH THIS FAQ, BUT HOPE-
- FULLY YOU, AS A READER OF THIS FAQ, WILL FORCE THEM TO PRODUCE
- MORE AND BETTER PRODUCTS.
-
- MOST ADDITIONAL INFORMATION IS WRITTEN AS PERSONAL COMMENT,
- AND SHOULD NOT BE TAKEN AS PROOFEN FACTS. INFORMATION IS
- PRESENTED "AS IS", COULD BE OUT OF DATE AND CANNOT BE
- GARANTIED TO BE THE TRUTH. THIS INFOMATION COMES WITHOUT
- WARRANTY OF ANY KIND, INCLUDING WITHOUT LIMITATION OF
- WARRANTIES OF MERCHANTABILITY, FITNESS FOR PARTICULAR
- PURPOSE AND NON-INFRINGEMENT.
-
- UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL THEORY, TORT, CONTRACT,
- OR OTHERWISE, SHALL THE AUTHOR BE LIABLE TO YOU OR ANY OTHER
- PERSON FOR ANY INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL
- DAMAGES OF ANY CHARACTER INCLUDING, WITHOUT LIMITATION, DAMAGES
- FOR LOSS OF GOODWILL, WORK STOPPAGE, COMPUTER FAILURE OR
- MALFUNCTION, OR ANY AND ALL OTHER COMMERCIAL DAMAGES OR LOSSES.
-
- Frank Gadegast
-
- -------------------------------------------------------------------------------
-
- ~Subject: Copyright information
-
- THIS COMPILATION OF INFORMATION IS COPYRIGHTED BY THE AUTHOR
- AND MAINTAINER, CURRENTLY FRANK GADEGAST. ANY NON-COMMERCIAL
- USE OF IT, OR PARTS OF IT IS ALLOWED, UNTIL THE USE OF IT IS
- REPORTED TO THE AUTHOR AND THE COMPILATION IS KEPT UNCHANGED.
- ADDITONAL, IF PARTS OF IT ARE USED, INFORMATION HAS TO BE ADDED
- WITH THAT PART, WHO THE AUTHOR OF THAT PARTS IS, THAT IT BELONGS
- TO THE COMPLETE COMPILATION AND WHERE TO FIND THE COMPLETE
- COMPILATION.
-
- COMMERCIAL USE CAN BE GRANTED IN SPECIAL CIRCUMSTANCES, FEEL
- FREE TO ASK AND SEND A DESCRIPTION OF THE INTENDED USE, TO
- RECEIVE A CERTIFICATION.
-
- ANY NON-REPORTED OR NON-CERTIFIED COMMERCIAL USE OF THIS
- COMPILATION IS A VIOLATION OF GERMAN COPYRIGHT LAW !
-
- ANY RE-PUBLICATION OF THE INFORMATION IN THIS COMPILATION SHOULD
- BE REPORTED TO THE AUTHOR AND SHOULD BE QUOTED IN THE NEW
- PUBLICATION.
-
- ANY RE-DISTRIBUTION OF THE COMPLETE FILE ON NON-COMMERCIAL
- ARCHIVES, LIKE FTP- OR FAQ-MIRRORS IS ALLOWED.
-
- -------------------------------------------------------------------------------
-
- ~Subject: Digest format
-
- It should be possible to read this FAQ with a threaded newsreader or emacs
- in FAQ-mode to enable you, to jump from one question to another, because
- this FAQ is organized as a digest.
-
- You can move to the next question with the digest commands in gnus, rn or
- other newsreaders, or with a regex search for ^~Subject or ^--.
-
- -------------------------------------------------------------------------------
-
- ~Subject: Recommendations
-
- Well, to stop some of the most enoying question, from those that do not read
- this FAQ at all, I recommend the following player/decoder and encoder.
- Search the FAQ for these words and download them BEFORE e-mailing to me !
-
- DOS: VMPEG, MAPLAYPC and CMPEG, ENC11BIN
- Windows: VMPEG, SoftPeg, COOL 1.5.3 and Maplay 1.2 for Win32
- Unix: XMPLAY and VCR
-
- CD-I's and Video-CDs are currently only supported by VMPEG and SoftPeg !
-
- -------------------------------------------------------------------------------
-
- ~Subject: What questions are getting answered in this FAQ ?
-
- SECTION 0. - INTRO
- Disclaimer
- Copyright information
- Digest format
- What questions are getting answered in this FAQ ?
- SECTION 1. - WHAT IS MPEG-VIDEO/VIDEO
- What is MPEG ?
- What is MPEG-Audio then ?
- What is the Audio Layer 3 then ?
- What is MPEG-1+ ?
- What is MPEG-2 ?
- What happened at the MPEG - NY meeting ?
- What's about Video-CD and CD-I ?
- SECTION 2. - PROFESSIONAL SOFTWARE
- SUBSECTION - DOS
- MPEG Encoder by Xing
- SUBSECTION - WINDOWS
- MPEG ARCADETM
- XingSound
- XingCD
- SUBSECTION - UNIX
- Xing Distributed Media Architecture
- NVR Research Kit
- Demo of NVR Digital Media Development Kit
- How will I get the NVR-Software ?
- SECTION 3. - FREE AVAILABLE SOFTWARE
- SUBSECTION - DOS
- layr_100
- mpeg2ppm
- vmpeg
- cmpeg
- dmpeg
- secmpeg
- mpegstat
- enc11dos
- pvrg MPEG
- SUBSECTION - Windows
- XingIt
- mpgaudio
- SUBSECTION - WINDOWS-NT
- mpeg2ply
- mpegplay
- SUBSECTION - OS/2
- mp
- SUBSECTION - X-WINDOWS and UNIX
- Berkeley's MPEG Tools
- MPEG-1 Video Software Encoder
- MPEG Video Software Decoder
- MPEG Video Software Analyzer
- MPEG Blocks Analyzer
- MPEG Video Software Statistics Gatherer
- xmg
- mpegstat
- mplex
- xmplay
- xplayer
- xmpeg.tk
- mpeg2encode / mpeg2decode
- mpegaudio
- maplay
- Scanning MPEG's ...
- MPEG decoder...
- MPEGTool
- What is "SECMPEG" ?
- PVRG-MPEG Codec
- wdgt
- SUBSECTION - VMS
- vms MPEG
- SUBSECTION - MacIntosh
- Sparcle
- Qt2MPEG
- Audio on Macintosh ?!
- SUBSECTION - Atari
- SUBSECTION - Amiga
- MPEG2DCTV
- SUBSECTION - NeXT
- MPEG_Play.app
- mpegnext
- SUBSECTION - SGI
- SECTION 4. - MPEG-RELATED HARDWARE
- MPEG audio Layer-3
- Video-Maker
- Some MPEG chips
- Optibase
- ReelMagic
- Cinerama
- XingIt!-card
- MPEG-decompression hardware list
- Amiga CD32
- SECTION 5. - MAILBOX-ACCESS
- Genoabox
- Xing Technologies BBS and fax
- SECTION 6. - FTP-ACCESS
- FTP-ACCESS - Overview
- MPEG-2 validation bitstreams
- Audio streams and utils
- Accessing Aminet
- Where will I find test-material for MPEG-encoders ?
- SECTION 7. - WWW-ACCESS
- Where is the WWW-home of this FAQ ?
- An Interactive Explanation on the Web ?
- Where is the WWW-demo of "The Internet MPEG CD-Rom" ?
- Which archive is mostly related to MPEG-Audio ?
- What's with Bryan Woodworth ftp-area ?
- Rock'n'Roll stored in MPEG on the Web ?
- Where can I find space movies coded in MPEG ?
- Movies on Web-site
- Where can I find fractal movies coded in MPEG ?
- Is qt2mpeg on the Web ?
- What are other good URL's ?
- SECTION 8. - MAIL ORDER
- The Internet MPEG CD-Rom
- Conversion, WWW and CD-Rom production service
- How can I order information from C-CUBE ?
- SECTION 9. - ADDITIONAL INFORMATION
- What are the MPEG standard documents ?
- So, the Xing decoder is cheating, right ?
- What is Aware Inc. doing ?
- Will MPEG be included in QuickTime ?
- What's about MPEG-2 software ?
- What about good MPEG Hardware encoders (Optivision) ?
- What's about CD-I ?
- What is the PCMotion Player ?
- What is the MPEG-2 ISO number ?
- Some papers about MPEG-audio
- Where can I find more documents about what Berkeley is doing ?
- Is there a book about MPEG ?
- Who are CD-I producers ?
- Where can I get VideoCD and CD-I coding ?
- Where can I do MPEG encoding ?
- What the problem with all these file extensions for MPEG-files ?
- How can I do RTP encapsulation of MPEG1/MPEG2 ?
- Wo kann ich den MPEG-standard bestellen ?
- SECTION 10. - WHERE TO FIND MORE INFOS
- What newsgroups discuss MPEG ?
- How can 'archie' help me ?
- SECTION 11. - QUESTIONS
-
- ===========================================================================
-
- ~Subject: SECTION 1. - WHAT IS MPEG-VIDEO/VIDEO
-
- -------------------------------------------------------------------------------
-
- ~Subject: What is MPEG ?
-
- From comp.compression Mon Oct 19 15:38:38 1992
- Sender: news@chorus.chorus.fr
- Author: Mark Adler <madler@alumni.caltech.edu>
-
- [71] Introduction to MPEG (long)
- What is MPEG?
- Does it have anything to do with JPEG?
- Then what's JBIG and MHEG?
- What has MPEG accomplished?
- So how does MPEG I work?
- What about the audio compression?
- So how much does it compress?
- What's phase II?
- When will all this be finished?
- How do I join MPEG?
- How do I get the documents, like the MPEG I standard?
-
- [ There is no newer version of this part so far. Whoever wants to update ]
- [ this description, should do the job and send it over. ]
-
- Written by Mark Adler <madler@alumni.caltech.edu>.
-
- Q. What is MPEG?
- A. MPEG is a group of people that meet under ISO (the International
- Standards Organization) to generate standards for digital video
- (sequences of images in time) and audio compression. In particular,
- they define a compressed bit stream, which implicitly defines a
- decompressor. However, the compression algorithms are up to the
- individual manufacturers, and that is where proprietary advantage
- is obtained within the scope of a publicly available international
- standard. MPEG meets roughly four times a year for roughly a week
- each time. In between meetings, a great deal of work is done by
- the members, so it doesn't all happen at the meetings. The work
- is organized and planned at the meetings.
-
- Q. So what does MPEG stand for?
- A. Moving Pictures Experts Group.
-
- Q. Does it have anything to do with JPEG?
- A. Well, it sounds the same, and they are part of the same subcommittee
- of ISO along with JBIG and MHEG, and they usually meet at the same
- place at the same time. However, they are different sets of people
- with few or no common individual members, and they have different
- charters and requirements. JPEG is for still image compression.
-
- Q. Then what's JBIG and MHEG?
- A. Sorry I mentioned them. Ok, I'll simply say that JBIG is for binary
- image compression (like faxes), and MHEG is for multi-media data
- standards (like integrating stills, video, audio, text, etc.).
- For an introduction to JBIG, see question 74 below.
-
- Q. Ok, I'll stick to MPEG. What has MPEG accomplished?
- A. So far (as of January 1996), they have completed the "Standard
- of MPEG phase I, colloquially called MPEG I. This defines
- a bit stream for compressed video and audio optimized to fit into
- a bandwidth (data rate) of 1.5 Mbits/s. This rate is special
- because it is the data rate of (uncompressed) audio CD's and DAT's.
- The standard is in three parts, video, audio, and systems, where the
- last part gives the integration of the audio and video streams
- with the proper timestamping to allow synchronization of the two.
- They have also gotten well into MPEG phase II, whose task is to
- define a bitstream for video and audio coded at around 3 to 10
- Mbits/s.
-
- Q. So how does MPEG I work?
- A. First off, it starts with a relatively low resolution video
- sequence (possibly decimated from the original) of about 352 by
- 240 frames by 30 frames/s (US--different numbers for Europe),
- but original high (CD) quality audio. The images are in color,
- but converted to YUV space, and the two chrominance channels
- (U and V) are decimated further to 176 by 120 pixels. It turns
- out that you can get away with a lot less resolution in those
- channels and not notice it, at least in "natural" (not computer
- generated) images.
-
- <IMG SRC="yuv411.gif">
-
- <IMG SRC="yuv422.gif">
-
- <IMG SRC="yuv444.gif">
-
- The basic scheme is to predict motion from frame to frame in the
- temporal direction, and then to use DCT's (discrete cosine
- transforms) to organize the redundancy in the spatial directions.
- The DCT's are done on 8x8 blocks, and the motion prediction is
- done in the luminance (Y) channel on 16x16 blocks. In other words,
- given the 16x16 block in the current frame that you are trying to
- code, you look for a close match to that block in a previous or
- future frame (there are backward prediction modes where later
- frames are sent first to allow interpolating between frames).
- The DCT coefficients (of either the actual data, or the difference
- between this block and the close match) are "quantized", which
- means that you divide them by some value to drop bits off the
- bottom end. Hopefully, many of the coefficients will then end up
- being zero. The quantization can change for every "macroblock"
- (a macroblock is 16x16 of Y and the corresponding 8x8's in both
- U and V). The results of all of this, which include the DCT
- coefficients, the motion vectors, and the quantization parameters
- (and other stuff) is Huffman coded using fixed tables. The DCT
- coefficients have a special Huffman table that is "two-dimensional"
- in that one code specifies a run-length of zeros and the non-zero
- value that ended the run. Also, the motion vectors and the DC
- DCT components are DPCM (subtracted from the last one) coded.
-
- Q. So is each frame predicted from the last frame?
- A. No. The scheme is a little more complicated than that. There are
- three types of coded frames. There are "I" or intra frames. They
- are simply a frame coded as a still image, not using any past
- history. You have to start somewhere. Then there are "P" or
- predicted frames. They are predicted from the most recently
- reconstructed I or P frame. (I'm describing this from the point
- of view of the decompressor.) Each macroblock in a P frame can
- either come with a vector and difference DCT coefficients for a
- close match in the last I or P, or it can just be "intra" coded
- (like in the I frames) if there was no good match.
-
- Lastly, there are "B" or bidirectional frames. They are predicted
- from the closest two I or P frames, one in the past and one in the
- future. You search for matching blocks in those frames, and try
- three different things to see which works best. (Now I have the
- point of view of the compressor, just to confuse you.) You try using
- the forward vector, the backward vector, and you try averaging the
- two blocks from the future and past frames, and subtracting that from
- the block being coded. If none of those work well, you can intra-
- code the block.
-
- The sequence of decoded frames usually goes like:
-
- IBBPBBPBBPBBIBBPBBPB...
-
- Where there are 12 frames from I to I (for US and Japan anyway.)
- This is based on a random access requirement that you need a
- starting point at least once every 0.4 seconds or so. The ratio
- of P's to B's is based on experience.
-
- Of course, for the decoder to work, you have to send that first
- P *before* the first two B's, so the compressed data stream ends
- up looking like:
-
- 0xx312645...
-
- where those are frame numbers. xx might be nothing (if this is
- the true starting point), or it might be the B's of frames -2 and
- -1 if we're in the middle of the stream somewhere.
-
- You have to decode the I, then decode the P, keep both of those
- in memory, and then decode the two B's. You probably display the
- I while you're decoding the P, and display the B's as you're
- decoding them, and then display the P as you're decoding the next
- P, and so on.
-
- Q. You've got to be kidding.
- A. No, really!
-
- Q. Hmm. Where did they get 352x240?
- A. That derives from the CCIR-601 digital television standard which
- is used by professional digital video equipment. It is (in the US)
- 720 by 243 by 60 fields (not frames) per second, where the fields
- are interlaced when displayed. (It is important to note though
- that fields are actually acquired and displayed a 60th of a second
- apart.) The chrominance channels are 360 by 243 by 60 fields a
- second, again interlaced. This degree of chrominance decimation
- (2:1 in the horizontal direction) is called 4:2:2. The source
- input format for MPEG I, called SIF, is CCIR-601 decimated by 2:1
- in the horizontal direction, 2:1 in the time direction, and an
- additional 2:1 in the chrominance vertical direction. And some
- lines are cut off to make sure things divide by 8 or 16 where
- needed.
-
- Q. What if I'm in Europe?
- A. For 50 Hz display standards (PAL, SECAM) change the number of lines
- in a field from 243 or 240 to 288, and change the display rate to
- 50 fields/s or 25 frames/s. Similarly, change the 120 lines in
- the decimated chrominance channels to 144 lines. Since 288*50 is
- exactly equal to 240*60, the two formats have the same source data
- rate.
-
- Q. You didn't mention anything about the audio compression.
- A. Oh, right. Well, I don't know as much about the audio compression.
- Basically they use very carefully developed psychoacoustic models
- derived from experiments with the best obtainable listeners to
- pick out pieces of the sound that you can't hear. There are what
- are called "masking" effects where, for example, a large component
- at one frequency will prevent you from hearing lower energy parts
- at nearby frequencies, where the relative energy vs. frequency
- that is masked is described by some empirical curve. There are
- similar temporal masking effects, as well as some more complicated
- interactions where a temporal effect can unmask a frequency, and
- vice-versa.
-
- The sound is broken up into spectral chunks with a hybrid scheme
- that combines sine transforms with subband transforms, and the
- psychoacoustic model written in terms of those chunks. Whatever
- can be removed or reduced in precision is, and the remainder is
- sent. It's a little more complicated than that, since the bits
- have to be allocated across the bands. And, of course, what is
- sent is entropy coded.
-
- Q. So how much does it compress?
- A. As I mentioned before, audio CD data rates are about 1.5 Mbits/s.
- You can compress the same stereo program down to 256 Kbits/s with
- no loss in discernable quality. (So they say. For the most part
- it's true, but every once in a while a weird thing might happen
- that you'll notice. However the effect is very small, and it takes
- a listener trained to notice these particular types of effects.)
- That's about 6:1 compression. So, a CD MPEG I stream would have
- about 1.25 MBits/s left for video. The number I usually see though
- is 1.15 MBits/s (maybe you need the rest for the system data
- stream). You can then calculate the video compression ratio from
- the numbers here to be about 26:1. If you step back and think
- about that, it's little short of a miracle. Of course, it's lossy
- compression, but it can be pretty hard sometimes to see the loss,
- if you're comparing the SIF original to the SIF decompressed. There
- is, however, a very noticeable loss if you're coming from CCIR-601
- and have to decimate to SIF, but that's another matter. I'm not
- counting that in the 26:1.
-
- The standard also provides for other bit rates ranging from 32Kbits/s
- for a single channel, up to 448 Kbits/s for stereo.
-
- Q. What's phase II?
- A. As I said, there is a considerable loss of quality in going from
- CCIR-601 to SIF resolution. For entertainment video, it's simply
- not acceptable. You want to use more bits and code all or almost
- all the CCIR-601 data. From subjective testing at the Japan
- meeting in November 1991, it seems that 4 MBits/s can give very
- good quality compared to the original CCIR-601 material. The
- objective of phase II is to define a bit stream optimized for these
- resolutions and bit rates.
-
- Q. Why not just scale up what you're doing with MPEG I?
- A. The main difficulty is the interlacing. The simplest way to extend
- MPEG I to interlaced material is to put the fields together into
- frames (720x486x30/s). This results in bad motion artifacts that
- stem from the fact that moving objects are in different places
- in the two fields, and so don't line up in the frames. Compressing
- and decompressing without taking that into account somehow tends to
- muddle the objects in the two different fields.
-
- The other thing you might try is to code the even and odd field
- streams separately. This avoids the motion artifacts, but as you
- might imagine, doesn't get very good compression since you are not
- using the redundancy between the even and odd fields where there
- is not much motion (which is typically most of image).
-
- Or you can code it as a single stream of fields. Or you can
- interpolate lines. Or, etc. etc. There are many things you can
- try, and the point of MPEG II is to figure out what works well.
- MPEG II is not limited to consider only derivations of MPEG I.
- There were several non-MPEG I-like schemes in the competition in
- November, and some aspects of those algorithms may or may not
- make it into the final standard for entertainment video compression.
-
- Q. So what works?
- A. Basically, derivations of MPEG I worked quite well, with one that
- used wavelet subband coding instead of DCT's that also worked very
- well. Also among the worked-very-well's was a scheme that did not
- use B frames at all, just I and P's. All of them, except maybe one,
- did some sort of adaptive frame/field coding, where a decision is
- made on a macroblock basis as to whether to code that one as one
- frame macroblock or as two field macroblocks. Some other aspects
- are how to code I-frames--some suggest predicting the even field
- from the odd field. Or you can predict evens from evens and odds
- or odds from evens and odds or any field from any other field, etc.
-
- Q. So what works?
- A. Ok, we're not really sure what works best yet. The next step is
- to define a "test model" to start from, that incorporates most of
- the salient features of the worked-very-well proposals in a
- simple way. Then experiments will be done on that test model,
- making a mod at a time, and seeing what makes it better and what
- makes it worse. Example experiments are, B's or no B's, DCT vs.
- wavelets, various field prediction modes, etc. The requirements,
- such as implementation cost, quality, random access, etc. will all
- feed into this process as well.
-
- Q. When will all this be finished?
- A. I don't know. I'd have to hope in about a year or less.
-
- Q. How do I join MPEG?
- A. You don't join MPEG. You have to participate in ISO as part of a
- national delegation. How you get to be part of the national
- delegation is up to each nation. I only know the U.S., where you
- have to attend the corresponding ANSI meetings to be able to
- attend the ISO meetings. Your company or institution has to be
- willing to sink some bucks into travel since, naturally, these
- meetings are held all over the world. (For example, Paris,
- Santa Clara, Kurihama Japan, Singapore, Haifa Israel, Rio de
- Janeiro, London, etc.)
-
- Q. Well, then how do I get the documents, like the MPEG I standard ?
- A. MPEG is a ISO standard. It's exact name is ISO CD 11172.
- The standard consists of three parts: System, Video, and Audio. The
- System part (11172-1) deals with synchronization and multiplexing
- of audio-visual information, while the Video (11172-2) and Audio
- part (11172-3) address the video and the audio compression techniques
- respectively.
-
- You may order it from your national standards body (e.g. ANSI in
- the USA) or buy it from companies like
- OMNICOM
- phone +44 438 742424
- FAX +44 438 740154
-
- Or from 'ISO Online' at http://www.iso.ch/welcome.html
-
- -------------------------------------------------------------------------------
-
- ~Subject: What is MPEG-Audio then ?
-
- From: "Harald Popp" <POPP@iis.fhg.de>
- From: mortenh@oslonett.no
- Date: Fri, 25 Mar 1994 19:09:06 +0100
-
- Q. What is MPEG?
- A. MPEG is an ISO committee that proposes standards for
- compression of Audio and Video. MPEG deals with 3 issues:
- Video, Audio, and System (the combination of the two into one
- stream). You can find more info on the MPEG committee in other
- parts of this document.
-
- Q. I've heard about MPEG Video. So this is the same compression
- applied to audio?
- A. Definitely no. The eye and the ear... even if they are only a
- few centimeters apart, works very differently... The ear has
- a much higher dynamic range and resolution. It can pick out
- more details but it is "slower" than the eye.
- The MPEG committee chose to recommend 3 compression methods
- and named them Audio Layer-1, Layer-2, and Layer-3.
-
- Q. What does it mean exactly?
- A. MPEG-1, IS 11172-3, describes the compression of audio
- signals using high performance perceptual coding schemes.
- It specifies a family of three audio coding schemes,
- simply called Layer-1,-2,-3, with increasing encoder
- complexity and performance (sound quality per bitrate).
- The three codecs are compatible in a hierarchical
- way, i.e. a Layer-N decoder is able to decode bitstream data
- encoded in Layer-N and all Layers below N (e.g., a Layer-3
- decoder may accept Layer-1,-2 and -3, whereas a Layer-2
- decoder may accept only Layer-1 and -2.)
-
- Q. So we have a family of three audio coding schemes. What does
- the MPEG standard define, exactly?
- A. For each Layer, the standard specifies the bitstream format
- and the decoder. It does *not* specify the encoder to
- allow for future improvements, but an informative chapter
- gives an example for an encoder for each Layer.
-
- Q. What have the three audio Layers in common?
- A. All Layers use the same basic structure. The coding scheme can
- be described as "perceptual noise shaping" or "perceptual
- subband / transform coding".
- The encoder analyzes the spectral components of the audio
- signal by calculating a filterbank or transform and applies
- a psychoacoustic model to estimate the just noticeable
- noise-level. In its quantization and coding stage, the
- encoder tries to allocate the available number of data
- bits in a way to meet both the bitrate and masking
- requirements.
- The decoder is much less complex. Its only task is to
- synthesize an audio signal out of the coded spectral
- components.
- All Layers use the same analysis filterbank (polyphase with
- 32 subbands). Layer-3 adds a MDCT transform to increase
- the frequency resolution.
- All Layers use the same "header information" in their
- bitstream, to support the hierarchical structure of the
- standard.
- All Layers use a bitstream structure that contains parts that
- are more sensitive to biterrors ("header", "bit
- allocation", "scalefactors", "side information") and parts
- that are less sensitive ("data of spectral components").
- All Layers may use 32, 44.1 or 48 kHz sampling frequency.
- All Layers are allowed to work with similar bitrates:
- Layer-1: from 32 kbps to 448 kbps
- Layer-2: from 32 kbps to 384 kbps
- Layer-3: from 32 kbps to 320 kbps
-
- Q. What are the main differences between the three Layers, from a
- global view?
- A. From Layer-1 to Layer-3,
- complexity increases (mainly true for the encoder),
- overall codec delay increases, and
- performance increases (sound quality per bitrate).
-
- Q. Which Layer should I use for my application?
- A. Good Question. Of course, it depends on all your requirements.
- But as a first approach, you should consider the available
- bitrate of your application as the Layers have been
- designed to support certain areas of bitrates most
- efficiently, i.e. with a minimum drop of sound quality.
- Let us look a little closer at the strong domains of each
- Layer.
-
- Layer-1: Its ISO target bitrate is 192 kbps per audio
- channel.
- Layer-1 is a simplified version of Layer-2. It is most useful
- for bitrates around the "high" bitrates around or above
- 192 kbps. A version of Layer-1 is used as "PASC" with the
- DCC recorder.
-
- Layer-2: Its ISO target bitrate is 128 kbps per audio
- channel.
- Layer-2 is identical with MUSICAM. It has been designed as
- trade-off between sound quality per bitrate and encoder
- complexity. It is most useful for bitrates around the
- "medium" bitrates of 128 or even 96 kbps per audio
- channel. The DAB (EU 147) proponents have decided to use
- Layer-2 in the future Digital Audio Broadcasting network.
-
- Layer-3: Its ISO target bitrate is 64 kbps per audio channel.
- Layer-3 merges the best ideas of MUSICAM and ASPEC. It has
- been designed for best performance at "low" bitrates
- around 64 kbps or even below. The Layer-3 format specifies
- a set of advanced features that all address one goal: to
-
- preserve as much sound quality as possible even at rather
- low bitrates. Today, Layer-3 is already in use in various
- telecommunication networks (ISDN, satellite links, and so
- on) and speech announcement systems.
-
- Q. So how does MPEG audio work?
- A. Well, first you need to know how sound is stored in a
- computer. Sound is pressure differences in air. When picked up
- by a microphone and fed through an amplifier this becomes
- voltage levels. The voltage is sampled by the computer a
- number of times per second. For CD audio quality you need to
- sample 44100 times per second and each sample has a resolution
- of 16 bits. In stereo this gives you 1,4Mbit per second
- and you can probably see the need for compression.
-
- To compress audio MPEG tries to remove the irrelevant parts
- of the signal and the redundant parts of the signal. Parts of
- the sound that we do not hear can be thrown away. To do this
- MPEG Audio uses psychoacoustic principles.
-
- Q. Tell me more about sound quality. How good is MPEG audio
- compression? And how do you assess that?
- A. Today, there is no alternative to expensive listening tests.
- During the ISO-MPEG-1 process, 3 international listening tests
- have been performed, with a lot of trained listeners,
- supervised by Swedish Radio. They took place in 7.90, 3.91
- and 11.91. Another international listening test was
- performed by CCIR, now ITU-R, in 92.
- All these tests used the "triple stimulus, hidden reference"
- method and the so-called CCIR impairment scale to assess the
- audio quality.
- The listening sequence is "ABC", with A = original, BC = pair
- of original / coded signal with random sequence, and the
- listener has to evaluate both B and C with a number
- between 1.0 and 5.0. The meaning of these values is:
- 5.0 = transparent (this should be the original signal)
- 4.0 = perceptible, but not annoying (first differences
- noticable)
- 3.0 = slightly annoying
- 2.0 = annoying
- 1.0 = very annoying
- With perceptual codecs (like MPEG audio), all traditional
- parameters (like SNR, THD+N, bandwidth) are especially
- useless.
-
- Fraunhofer-IIS (among others) works on objective quality
- assessment tools, like the NMR meter (Noise-to-Mask-Ratio),
- too. If you need more informations about NMR, please
- contact nmr@iis.fhg.de
-
- Q. Now that I know how to assess quality, come on, tell me the
- results of these tests.
- A. Well, for details you should study one of those AES papers
- listed below. One main result is that for low bitrates (60
- or 64 kbps per channel, i.e. a compression ratio of around
- 12:1), Layer-2 scored between 2.1 and 2.6, whereas Layer-3
- scored between 3.6 and 3.8.
- This is a significant increase in sound quality, indeed!
- Furthermore, the selection process for critical sound material
- showed that it was rather difficult to find worst-case
- material for Layer-3 whereas it was not so hard to find
- such items for Layer-2.
- For medium and high bitrates (120 kbps or more per channel),
- Layer-2 and Layer-3 scored rather similar, i.e. even
- trained listeners found it difficult to detect differences
- between original and reconstructed signal.
-
- Q. So how does MPEG achieve this compression ratio?
- A. Well, with audio you basically have two alternatives. Either
- you sample less often or you sample with less resolution (less
- than 16 bit per sample). If you want quality you can't do much
- with the sample frequency. Humans can hear sounds with
- frequencies from about 20Hz to 20kHz. According to the Nyquist
- theorem you must sample at least two times the highest
- frequency you want to reproduce. Allowing for imperfect
- filters, a 44,1kHz sampling rate is a fair minimum. So
- you either set out to prove the Nyquist theorem is wrong or
- go to work on reducing the resolution. The MPEG committee
- chose the latter.
- Now, the real reason for using 16 bits is to get a good
- signal-to-noise (s/n) ratio. The noise we're talking
- about here is quantization noise from the digitizing
- process. For each bit you add, you get 6dB
- better s/n. (To the ear, 6dBu corresponds to a doubling of
- the sound level.) CD-audio achieves about 90dB s/n. This
- matches the dynamic range of the ear fairly well. That is, you
- will not hear any noise coming from the system itself (well,
- there is still some people arguing about that, but lets not
- worry about them for the moment).
- So what happens when you sample to 8 bit resolution? You get
- a very noticeable noise floor in your recording. You can
- easily hear this in silent moments in the music or between
- words or sentences if your recording is a human voice.
- Waitaminnit. You don't notice any noise in loud passages,
- right? This is the masking effect and is the key to MPEG Audio
- coding. Stuff like the masking effect belongs to a science
- called psycho-acoustics that deals with the way the human
- brain perceives sound.
- And MPEG uses psychoacoustic principles when it does its
- thing.
-
- Q. Explain this masking effect.
- A. OK, say you have a strong tone with a frequency of 1000Hz.
- You also have a tone nearby of say 1100Hz. This second tone is
- 18 dB lower. You are not going to hear this second tone. It is
- completely masked by the first 1000Hz tone. As a matter of
- fact, any relatively weak sounds near a strong sound is
- masked. If you introduce another tone at 2000Hz also 18 dB
- below the first 1000Hz tone, you will hear this.
- You will have to turn down the 2000Hz tone to something like
- 45 dB below the 1000Hz tone before it will be masked by the
- first tone. So the further you get from a sound the less
- masking effect it has.
- The masking effect means that you can raise the noise floor
- around a strong sound because the noise will be masked anyway.
- And raising the noise floor is the same as using less bits
- and using less bits is the same as compression. Do you get it?
-
- Q. I don't get it.
- A. Well, let me try to explain how the MPEG Audio Layer-2 encoder
- goes about its thing. It divides the frequency spectrum (20Hz
- to 20kHz) into 32 subbands. Each subband holds a little slice
- of the audio spectrum. Say, in the upper region of subband 8,
- a 6500Hz tone with a level of 60dB is present. OK, the
- coder calculates the masking effect of this sound and finds
- that there is a masking threshold for the entire 8th
- subband (all sounds w. a frequency...) 35dB below this tone.
- The acceptable s/n ratio is thus 60 - 35 = 25 dB. The equals 4
- bit resolution. In addition there are masking effects on band
- 9-13 and on band 5-7, the effect decreasing with the distance
- from band 8.
- In a real-life situation you have sounds in most bands and the
- masking effects are additive. In addition the coder considers
- the sensitivity of the ear for various frequencies. The ear
- is a lot less sensitive in the high and low frequencies. Peak
- sensivity is around 2 - 4kHz, the same region that the human
- voice occupies.
- The subbands should match the ear, that is each subband should
- consist of frequencies that have the same psychoacoustic
- properties. In MPEG Layer 2, each subband is 750Hz wide
- (with 48 kHz sampling frequency). It would have been better if
- the subbands were narrower in the low frequency range and
- wider in the high frequency range. That is the trade-off
- Layer-2 took in favour of a simpler approach.
- Layer-3 has a much higher frequency resolution (18 times
- more) - and that is one of the reasons why Layer-3 has a much
- better low bitrate performance than Layer-2.
- But there is more to it. I have explained concurrent masking,
- but the masking effect also occurs before and after a strong
- sound (pre- and postmasking).
-
- Q. Before?
- A. Yes, if there is a significant (30 - 40dB ) shift in level.
- The reason is believed to be that the brain needs some
- processing time. Premasking is only about 2 to 5 ms. The
- postmasking can be up till 100ms.
- Other bit-reduction techniques involve considering tonal and
- non-tonal components of the sound. For a stereo signal you
- may have a lot of redundancy between channels. All MPEG
- Layers may exploit these stereo effects by using a "joint-
- stereo" mode, with a most flexible approach for Layer-3.
- Furthermore, only Layer-3 further reduces the redundancy
- by applying huffmann coding.
-
- Q. What are the downside?
- A. The coder calculates masking effects by an iterative process
- until it runs out of time. It is up to the implementor to
- spend bits in the least obtrusive fashion.
- For Layer 2 and Layer 3, the encoder works on 24 ms of sound
- (with 1152 sample, and fs = 48 kHz) at a time. For some
- material, the time-window can be a problem. This is
- normally in a situation with transients where there are large
- differences in sound level over the 24 ms. The masking is
- calculated on the strongest sound and the weak parts will
- drown in quantization noise. This is perceived as a "noise-
- echo" by the ear. Layer 3 addresses this problem
- specifically by using a smaller analysis window (4 ms), if
- the encoder encounters an "attack" situation.
-
- Q. Tell me about the complexity. What are the hardware demands?
-
- A. Alright. First, we have to separate between decoder and
- encoder.
- Remember: the MPEG coding is done asymmetrical, with a much
- larger workload on the encoder than on the decoder.
- For a stereo decoder, variuos real-time implementations exist
- for Layer-2 and Layer-3. They are either based on single-DSP
- solutions or on dedicated MPEG audio decoder chips. So
- you need not worry about decoder complexity.
- For a stereo Layer-2-encoder, various DSP based solutions with
- one or more DSPs exist (with different quality, also).
- For a stereo Layer-3-encoder achieving ISO reference quality,
- the current real-time implementations use two DSP32C and
- two DSP56002.
-
- Q. How many audio channels?
- A. MPEG-1 allows for two audio channels. These can be either
- single (mono), dual (two mono channels), stereo or
- joint stereo (intensity stereo (Layer-2 and Layer-3) or m/s-
- stereo (Layer-3 only)).
- In normal (l/r) stereo one channel carries the left audio
- signal and one channel carries the right audio signal. In
- m/s stereo one channel carries the sum signal (l+r) and the
- other the difference (l-r) signal. In intensity stereo the
- high frequency part of the signal (above 2kHz) is combined.
- The stereo image is preserved but only the temporal envelope
- is transmitted.
- In addition MPEG allows for pre-emphasis, copyright marks and
- original/copy marks. MPEG-2 allows for several channels in
- the same stream.
-
- Q. What about the audio codec delay?
- A. Well, the standard gives some figures of the theoretical
- minimum delay:
- Layer-1: 19 ms (<50 ms)
- Layer-2: 35 ms (100 ms)
- Layer-3: 59 ms (150 ms)
- The practical values are significantly above that. As they
- depend on the implementation, exact figures are hard to
- give. So the figures in brackets are just rough thumb
- values.
- Yes, for some applications, a very short delay is of critical
- importance. E.g. in a feedback link, a reporter can only talk
- intelligibly if the overall delay is below around 10 ms.
- If broadcasters want to apply MPEG audio coding, they have to
- use "N-1" switches in the studio to overcome this problem
- (or appropriate echo-cancellers) - or they have to forget
- about MPEG at all.
- But with most applications, these figures are small enough to
- present no extra problem. At least, if one can accept a Layer-
- 2 delay, one can most likely also accept the higher Layer-3
- delay.
-
- Q. OK, I am hooked on! Where can I find more technical
- informations about MPEG audio coding, especially about Layer-
- 3?
- A. Well, there is a variety of AES papers, e.g.
-
- K. Brandenburg, G. Stoll, ...: "The ISO/MPEG-Audio Codec: A
- Generic Standard for Coding of High Quality Digital Audio",
- 92nd AES, Vienna 1992, pp.3336
-
- E. Eberlein, H. Popp, ...: "Layer-3, a Flexible Coding
- Standard", 94th AES, Berlin 93, pp.3493
-
- K. Brandenburg, G. Zimmer, ...: "Variable Data-Rate Recording
- on a PC Using MPEG-Audio Layer-3", 95th AES, New York 93
-
- B. Grill, J. Herre,... : "Improved MPEG-2 Audio Multi-Channel
- Encoding", 96th AES, Amsterdam 94
-
- And for further informations, please contact layer3@iis.fhg.de
-
- Q. Where can I get more details about MPEG audio?
- A. Still more details? No shit. You can get the full ISO spec
- from Omnicom. The specs do a fairly good job of obscuring
- exactly how these things are supposed to work... Jokes aside,
- there are no description of the coder in the specs. The specs
- describes in great detail the bitstream and suggests
- psychoacoustic models.
-
- Originally written by Morten Hjerde <100034,663@compuserve.com>,
- modified and updated by Harald Popp (layer3@iis.fhg.de).
-
- Harald Popp
- Audio & Multimedia ("Music is the *BEST*" - F. Zappa)
- Fraunhofer-IIS-A, Weichselgarten 3, D-91058 Erlangen, Germany
- Phone: +49-9131-776-340
- Fax: +49-9131-776-399
- email: popp@iis.fhg.de
-
- -------------------------------------------------------------------------------
-
- ~Subject: What is the Audio Layer 3 then ?
-
- Informations about MPEG Audio Layer-3
- Version 1.51 - 1. 95
-
- This text is organized as a kind of Mini-FAQ (Frequently Asked
- Questions). It covers several topics:
-
- 1. ISO-MPEG Standard
- 2. MPEG Audio Codec Family ("Layer 1, 2, 3")
- 3. Applications
- 4. Products
- 5. Support by Fraunhofer-IIS
- 6. Shareware Information
-
- For further comments and questions regarding Layer-3, please contact:
- - layer3@iis.fhg.de
-
- For further informations about MPEG, you may also like to contact:
- - phade@powerweb.de
-
-
- 1. ISO-MPEG Standard
-
- Q: What is MPEG, exactly?
- A: MPEG is the "Moving Picture Experts Group", working under the joint
- direction of the International Standards Organization (ISO) and the
- International Electro-Technical Commission (IEC). This group works on
- standards for the coding of moving pictures and associated audio.
-
- Q: What is the status of MPEG's work, then? What about MPEG-1, -2, and so
- on?
- A: MPEG approaches the growing need for multimedia standards step-by-
- step. Today, three "phases" are defined:
-
- MPEG-1:"Coding of Moving Pictures and Associated Audio for
- Digital Storage Media at up to about 1.5 MBit/s"
- Status: International Standard IS-11172, completed in 10.92
-
- MPEG-2:"Generic Coding of Moving Pictures and Associated
- Audio"
- Status: International Standard IS-13818, completed in 11.94
-
- MPEG-3: does no longer exist (has been merged into MPEG-2)
-
- MPEG-4: "Very Low Bitrate Audio-Visual Coding"
- Status: Call for Proposals first deadline 1. 10. 95
-
- Q: MPEG-1 and MPEG-2 are ready-for-use. How do the standards look like?
- A: Both standards consist of 4 main parts.
- The structure is the same for MPEG-1 and MPEG-2.
- -1: System describes synchronization and multiplexing of video and audio
- -2: Video describes compression of video signals
- -3: Audio describes compression of audio signals
- -4: Compliance Testing describes procedures for determining the characteristics
- of coded bitstreams and the decoding process and for testing compliance with
- the requirements stated in the other parts.
-
- Q: How do I get the MPEG documents?
- A: You order it from your national standards body.
- E.g., in Germany, please contact:
- DIN-Beuth Verlag, Auslandsnormen
- Mrs. Niehoff, Burggrafenstr. 6, D-10772 Berlin, Germany
- Phone: +49-30-2601-2757, Fax: +49-30-2601-1231
-
-
- 2. MPEG Audio Codec Family ("Layer 1, 2, 3")
-
- Q: Talking about MPEG audio coding, I heard a lot about "Layer 1, 2 and 3".
- What does it mean, exactly?
- A: MPEG describes the compression of audio signals using high performance
- perceptual coding schemes. It specifies a family of three audio coding
- schemes, simply called Layer-1,-2,-3, with increasing encoder complexity
- and performance (sound quality per bitrate) from 1 to 3.
- The three codecs are compatible in a hierarchical way, i.e. a Layer-N
- decoder is able to decode bitstream data encoded in Layer-N and all Layers
- below N (e.g., a Layer-3 decoder may accept Layer-1,-2 and -3, whereas a
- Layer-2 decoder may accept only Layer-1 and -2.)
-
- Q: So we have a family of three audio coding schemes. What does the MPEG
- standard define, exactly?
- A: For each Layer, the standard specifies the bitstream format and the
- decoder. To allow for future improvements, it does *not* specify the
- encoder, but an informative chapter gives an example for an encoder for
- each Layer.
-
- Q: What have the three audio Layers in common?
- A: All Layers use the same basic structure. The coding scheme can be
- described as "perceptual noise shaping" or "perceptual subband / transform
- coding".
- The encoder analyzes the spectral components of the audio signal by
- calculating a filterbank or transform and applies a psychoacoustic model
- to estimate the just noticeable noise-level. In its quantization and coding
- stage, the encoder tries to allocate the available number of data bits in a
- way to meet both the bitrate and masking requirements.
- The decoder is much less complex. Its only task is to synthesize an audio
- signal out of the coded spectral components.
- All Layers use the same analysis filterbank (polyphase with 32 subbands).
- Layer-3 adds a MDCT transform to increase the frequency resolution.
- All Layers use the same "header information" in their bitstream, to support
- the hierarchical structure of the standard.
- All Layers have a similar sensitivity to biterrors. They use a bitstream
- structure that contains parts that are more sensitive to biterrors ("header",
- "bit allocation", "scalefactors", "side information") and parts that
- are less sensitive ("data of spectral components").
- All Layers support the insertion of programm-associated information
- ("ancillary data") into their audio data bitstream.
- All Layers may use 32, 44.1 or 48 kHz sampling frequency.
- All Layers are allowed to work with similar bitrates:
- Layer-1: from 32 kbps to 448 kbps
- Layer-2: from 32 kbps to 384 kbps
- Layer-3: from 32 kbps to 320 kbps
- The last two statements refer to MPEG-1; with MPEG-2, there is an
- extension for the sampling frequencies and bitrates (see below).
-
- Q: What are the main differences between the three Layers, from a global
- view?
- A: From Layer-1 to Layer-3,
- complexity increases (mainly true for the encoder),
- overall codec delay increases, and
- performance increases (sound quality per bitrate).
-
- Q: What are the main differences between MPEG-1 and MPEG-2 in the audio
- part?
- A: MPEG-1 and MPEG-2 use the same family of audio codecs, Layer-1, -2
- and -3. The new audio features of MPEG-2 are:
- "low sample rate extension" to address very low bitrate applications
- with limited bandwidth requirements (the new sampling frequencies
- are 16, 22.05 or 24 kHz, the bitrates extend down to 8 kbps),
- "multichannel extension" to address surround sound applications
- with up to 5 main audio channels (left, center, right, left surround,
- right surround) and optionally 1 extra "low frequency enhancement
- (LFE)" channel for subwoofer signals; in addition, a "multilingual
- extension" allows the inclusion of up to 7 more audio channels.
-
- Q: A lot of new stuff! Is this all compatible to each other?
- A: Well, more or less, yes - with the execption of the low sample rate
- extension. Obviously, a pure MPEG-1 decoder is not able to handle the
- new "half" sample rates.
-
- Q: You mean: compatible!? With all these extra audio channels? Please
- explain!
- A: Compatibility has been a major topic during the MPEG-2 definition phase.
- The main idea is to use the same basic bitstream format as defined in
- MPEG-1, with the main data field carrying two audio signals (called L0
- and R0) as before, and the ancillary data field carrying the multichannel
- extension information. Without going further into details, three terms can
- be explained here:
- "forwards compatible": the MPEG-2 decoder has to accept any
- MPEG-1 audio bitstream (that represents one or two audio channels)
- "backwards compatible": the MPEG-1 decoder should be able to
- decode the audio signals in the main data field (L0 and R0) of the
- MPEG-2 bitstream
- "Matrixing" may be used to get the surround information into L0 and
- R0:
- L0 = left signal + a * center signal + b * left surround signal
- R0 = right signal + a * center signal + b * right surround signal
- Therefore, a MPEG-1 decoder can reproduce a comprehensive downmix of
- the full 5-channel information. A MPEG-2 decoder uses the multichannel
- extension information (3 more audio signals) to reconstruct the five
- surround channels.
-
- Q: I heard something about a new NBC mode for MPEG-2 audio? What does
- it mean?
- A: "NBC" stands for "non-backwards compatible". During the development
- of the backwards compatible MPEG-2 standard, the experts encountered
- some trouble with the compatibility matrix. The introduced quantisation
- noise may become audible after dematrixing. Although some clever
- strategies have been devised to overcome this problem, the question
- remained how much better a non-compatible multichannel codec might
- perform.
- So ISO-MPEG decided to address that issue in a "NBC" working group -
- among the proponents are AT&T, Dolby, Fraunhofer, IRT, Philips, and
- Sony. Their work will lead to an addendum to the MPEG-2 standard
- (13818-8).
-
- Q: O.K., that should do for a first overview. Are there some papers for a more
- detailed information?
- A: Sure! You'll find more technical informations about MPEG audio coding
- in a variety of AES papers (AES = Audio Engineering Society). The AES
- organizes two conventions per year, and perceptual audio coding has been
- a topic since the middle of the 80s. Some interesting papers might be:
-
- K. Brandenburg, G. Stoll, et al.: "The ISO/MPEG-Audio Codec: A
- Generic Standard for Coding of High Quality Digital Audio", 92nd
- AES, Vienna Mar. 92, pp. 3336; revised version ("ISO-MPEG-1
- Audio: A Generic Standard...") published in the Journal of AES,
- Vol.42, No. 10, Oct. 94
-
- S. Church, B. Grill, et al.: "ISDN and ISO/MPEG Layer-3 Audio
- Coding: Powerful New tools for Broadcast and Audio Production",
- 95th AES, New York Oct. 93, pp. 3743
-
- E. Eberlein, H. Popp, et al.: "Layer-3, a Flexible Coding Standard",
- 94th AES, Berlin Mar. 93, pp. 3493
-
- B. Grill, J. Herre, et al.: "Improved MPEG-2 Audio Multi-Channel
- Encoding", 96th AES, Amsterdam Feb. 94, pp. 3865
-
- J. Herre, K. Brandenburg, et al.: "Second Generation ISO/MPEG
- Audio Layer-3 Coding", 98th AES, Paris Feb. 95
-
- F.-O. Witte, M. Dietz, et al.: "'Single Chip Implementation of an
- ISO/MPEG Layer-3 Decoder", 96th AES, Amsterdam Feb. 94, pp.
- 3805
-
- For ordering informations, contact:
-
- AES
- 60 East 42nd Street, Suite 2520
- New York, NY 10165-2520, USA
- phone: (212) 661-8528, fax: (212) 682-0477
-
- Another interesting publication: the "Proceedings of the Sixth Tirrenia
- International Workshop on Digital Communications", Tirrenia Sep. 93,
- Elsevier Science B.V. Amsterdam 94 (ISBN 0 444 81580 5).
-
- An excellent tutorial about MPEG-2 has recently been published in a
- German technical journal (Fernseh- und Kino-Technik); part 4, by E. F.
- Schroeder and J. Spille, talks about the audio part (7/8 94, p. 364 ff).
-
- And for further informations, please feel free to contact layer3@iis.fhg.de.
-
-
- 3. Applications
-
- Q: O.K., let us concentrate on one or two audio channels. Which Layer shall I
- use for my application?
- A: Good Question. Of course, it depends on all your requirements. But as a
- first approach, you should consider the available bitrate of your
- application as the Layers have been designed to support certain areas of
- bitrates most effectively. Roughly, today you can achieve a data reduction
- of around
- 1:4 with Layer-1 (or 192 kbps per audio channel),
- 1:6..8 with Layer-2 (or 128..96 kbps per audio channel), and
- 1:10..12 with Layer-3, (or 64..56 kbps per audio channel),
- and still the reconstructed audio signal will maintain a "CD-like" sound
- quality. This may be used as a first "thumb rule" - let's talk about details
- later on.
-
- Q: Why does the performance increase with the number of the Layer? Why
- does the standard define a family of audio codecs instead of one single
- powerful algorithm?
- A: Well, the MPEG standard has forged together two main coding schemes
- that offered advantages either in complexity (MUSICAM) or in
- performance (ASPEC).
- Layer-2 is identical with the MUSICAM format. It has been designed as a
- trade-off between sound quality per bitrate and encoder complexity. So it is
- most useful for the "medium" range of bitrates (96..128 kbps per channel).
- For higher bitrates, even a simplified version, the Layer-1, performs well
- enough. Layer-1 has originally been developed for a target bitrate of 192
- kbps per channel. It is used as "PASC" within the DCC recorder.
- For lower bitrates (64 kbps per channel or even less), the Layer-2 format
- suffers from its build-in limitations, and with decreasing bitrate, artefacts
- become audible more and more. Here is the strong domain of the most
- powerful MPEG audio format, Layer-3. It specifies a set of unique features
- that all address one goal: to preserve as much sound quality as possible
- even at very low bitrates.
-
- Q: Wait a second! I understand that Layer-3 has been an important asset to
- the MPEG-1 standard, to address the high-quality low bitrate
- applications. With the advent of the "low sample rate extension (LSF)" in
- MPEG-2, is it still necessary to rely on Layer-3 to achieve a high-quality
- sound at low bitrates?
- A: Yes, for sure! Please, don't mix up MPEG-1 and MPEG-2 LSF. MPEG-2
- LSF is useful only for applications with limited bandwidth (11.25 kHz, at
- best). For applications with full bandwidth, MPEG-1 Layer-3 at 64 or 56
- kbps per channel achieves the best sound quality of all ISO codecs.
- For applications with limited bandwidth, MPEG-2 LSF Layer-3 provides
- an excellent sound quality at 56 kbps for monophonic speech signals and
- still a good sound quality at only 64 kbps total bitrate for stereo music
- signals (with around 10 kHz bandwidth). The latest MPEG ISO listening
- test (in September 94 at NTT Japan, doc. MPEG 94/437) proved the
- superior performance of Layer-3 in MPEG-1 and MPEG-2 LSF.
-
- Q: Tell me more about sound quality. How do you assess that?
- A: Today, there is no alternative to expensive listening tests. During the ISO-
- MPEG process, a number of international listening tests have been
- performed, with a lot of trained listeners. All these tests used the "triple
- stimulus, hidden reference" method and the "CCIR impairment scale" to
- assess the sound quality.
- The listening sequence is "ABC", with A = original, BC = pair of original
- / coded signal with random sequence, and the listener has to evaluate both
- B and C with a number between 1.0 and 5.0. The meaning of these values
- is:
- 5.0 = transparent (this should be the original signal)
- 4.0 = perceptible, but not annoying (first differences noticable)
- 3.0 = slightly annoying
- 2.0 = annoying
- 1.0 = very annoying
-
- Q: Is there really no alternative to listening tests?
- A: No, there is not. With perceptual codecs, all traditional "quality"
- parameters (like SNR, THD+N, bandwidth) are rather useless, as any
- codec may introduce noise and distortions as long as it does not affect the
- perceived sound quality. So, listening tests are necessary, and, if carefully
- prepared and performed, lead to rather reliable results.
- Nevertheless, Fraunhofer-IIS works on objective sound quality assessment
- tools, too. There is already a first product available, the NMR meter, a
- real-time DSP-based measurement tool that nicely supports the analysis of
-