home *** CD-ROM | disk | FTP | other *** search
Text File | 1996-11-10 | 61.1 KB | 1,306 lines |
- Archive-name: mpeg-faq/part2
- Last-modified: 1996/06/02
- Version: v 4.1 96/06/02
- Posting-Frequency: bimonthly
-
- perceptual audio codecs. If you need more informations about the Noise-to-
- Mask-Ratio (NMR) technology, feel free to contact nmr@iis.fhg.de.
-
- Q: O.K., back to these listening tests. Come on, tell me some results.
- A: Well, for details you should study one of those AES papers or MPEG
- documents listed above. The main result is that for low bitrates (64 kbps
- per channel or below), Layer-3 always scored significantly better than
- Layer-2. Another important conclusion is the draft recommendation of the
- task group TG 10/2 within the ITU-R. It recommends the use of low bit-
- rate audio coding schemes for digital sound-broadcasting applications
- (doc. BS.1115).
-
- Q: Very interesting! Tell me more about this recommendation!
- A: The task group TG 10/2 concluded its work in October 93. The draft
- recommendation defines three fields of broadcast applications:
- - distribution and contribution links (20 kHz bandwidth, no audible
- impairments with up to 5 cascaded codecs)
- Recommendation: Layer-2 with 180 kbps per channel
- - emission (20 kHz bandwidth)
- Recommendation: Layer-2 with 128 kbps per channel
- - commentary links (15 kHz bandwidth)
- Recommendation: Layer-3 with 60 kbps for monophonic and 120 kbps
- for stereophonic signals
-
- Q: I see. Medium bitrates - Layer-2, low bitrates - Layer-3. What's about a
- bitrate of 96 kbps per channel that seems to be "somewhere in between"
- Layer-2 and Layer-3 domains?
- A: Interesting question. In fact, a total bitrate of 192 kbps for stereo music is
- useful for real applications, e.g. emission via satellite channels. The ITU-R
- required that emission codecs should score at least 4.0 on the CCIR
- impairment scale, even for the most critical material. At 128 kbps per
- channel, Dolby's AC-2, Layer-2 and Layer-3 fulfilled this requirement.
- Finally, Layer-2 got the recommendation mainly because of its
- "commonality with the distribution and contribution application".
- Further tests for emission were performed at 192 kbps joint-stereo coding.
- Layer-3 clearly met the requirements, Layer-2 fulfilled them only
- marginally, with doubts remaining during further tests with cascaded
- codecs in 1993. In the end, the task group decided to pronounce no
- recommendation for emission at 192 kbps.
-
- Q: Someone told me that in the ITU-R tests, there was some trouble with
- Layer-3, specifically on male voice in the German language. Still, Layer-3
- got the recommendation for "commentary links". Can you explain that?
- A: Yes. For commentary links, the quality requirements for speech were to be
- equivalent to 14-bit linear PCM, and for music, some perceptible
- impairments were to be tolerated. In the test in 1992, Layer-3 was by far
- the only codec that fulfilled these requirements (e.g. overall monophonic,
- Layer-3 scored 3.6 in contrast to Layer-2 at 2.05 - and for male German
- speech, Layer-3 scored 4.4 in contrast to Layer-2 at 2.4).
- Further tests were performed in 1993 using headphones. They showed that
- MPEG-1 Layer-3 with monophonic speech (the test item is German male
- voice) at 60 kbps did not fully meet the quality requirements. The ITU
- decided to recommend Layer-3 and to include a temporary footnote that
- will be removed as soon as an improved Layer-3 codec fulfills the
- requirements completely, i.e. even with that well-known critical male
- German speech item (for many other speech items, Layer-3 has no trouble
- at all).
-
- Q: O.K., a Layer-2 codec at low bitrates may sound poor today, but couldn't
- that be improved in the future? I guess you just told me before that the
- encoder is not fixed in the standard.
- A: Good thinking! As the sound quality mainly depends on the encoder
- implementation, it is true that there is no such thing as a "Layer-N"-
- quality. So we definitely only know the performance of the reference
- codecs used during the international tests. Who knows what will happen in
- the future? What we do know now, is:
- Today, in MPEG-1 and MPEG-2, Layer-3 provides the best sound quality
- at low bitrates, by far better than Layer-2.
- Tomorrow, both Layers may improve. Layer-2 has been designed as a
- trade-off between quality and complexity, so the bitstream format allows
- only limited innovations. In contrast, even the current reference Layer-3-
- codec does not exploit all of the powerful mechanisms inside the Layer-3
- bitstream format.
-
- Q: What other topics do I have to keep in mind? Tell me about the complexity
- of Layer-3.
- A: O.K. First, we have to separate between decoder and encoder, as the
- workload is distributed asymmetrically between them, i.e. the encoder
- needs much more computation power than the decoder.
- For a stereo Layer-3-decoder, you may either use a DSP (e.g. one
- DSP56002 from Motorola) or an "ASIC", like the masc-programmed DSP
- chip MAS 3503 C from Intermetall, ITT. Some rough requirements are:
- computation power around 12 MIPs
- Data ROM 2.5 Kwords
- Data RAM 4.5 Kwords
- Programm ROM 2 to 4 Kwords
- word length at least 20 bit
- Intermetall (ITT) estimated an overhead of around 30 % chip area for
- adding the necessary Layer-3 modules to a Layer-2-decoder. So you need
- not worry too much about decoder complexity.
- For a stereo Layer-3-encoder achieving reference quality, our current real-
- time implementations use two DSP32C (AT&T) and one DSP56002. With
- the advent of the 21060 (Analog Devices), even a single-chip stereo
- encoder comes into view.
-
- Q: Quality, complexity - what about the codec delay?
- A: Well, the standard gives some figures of the theoretical minimum delay:
- Layer-1: 19 ms (<50 ms)
- Layer-2: 35 ms (100 ms)
- Layer-3: 59 ms (150 ms)
- The practical values are significantly above that. As they depend on the
- implementation, exact figures are hard to give. So the figures in brackets
- are just rough thumb values - real codecs may show significant higher
- values.
-
- Q: For some applications, a very short delay is of critical importance: e.g. in a
- feedback link, a reporter can only talk intelligibly if the overall delay is
- below around 10 ms. Here, do I have to forget about MPEG audio at all?
-
- A: Not necessarily. In this application, broadcasters may use "N-1" switches
- in the studio to overcome this problem - or they may use equipment with
- appropriate echo-cancellers.
- But with many applications, these delay figures are small enough to
- present no extra problem. At least, if one can accept a Layer-2 delay, one
- can most likely also accept the higher Layer-3 delay.
-
- Q: Someone told me that, with Layer-3, the codec delay would depend on the
- actual audio signal, varying over the time. Is this really true?
- A: No. The codec delay does not depend on the audio signal.With all Layers,
- the delay depends on the actual implementation used in a specific codec, so
- different codecs may have different delays. Furthermore, the delay depends
- on the actual sample rate and bitrate of your codec.
-
- Q: All in all, you sound as if anybody should use Layer-3 for low bitrates.
- Why on earth do some vendors still offer only Layer-2 equipment for these
- applications?
- A: Well, maybe because they started to design and develop their systems
- rather early, e.g. in 1990. As Layer-2 is identical with MUSICAM, it has
- been available since summer of 1990, at latest. In that year, Layer-3
- development started and could be successfully finished at the end of 1991.
- So, for a certain time, vendors could only exploit the already existing part
- of the new MPEG standard.
- Now the situation has changed. All Layers are available, the standard is
- completed, and new systems may capitalize on the full features of MPEG
- audio.
-
- 4. Products
-
- Q: What are the main fields of application for Layer-3?
- A: Simply put: all applications that need high-quality sound at very low
- bitrates to store or transmit music signals. Some examples are:
- - high-quality music links via ISDN phone lines (basic rate)
- - sound broadcasting via low bitrate satellite channels
- - music distribution in computer networks with low demands for channel
- bandwidth and memory capacity
- - music memories for solid state recorders based on ROM chips
-
- Q: What kind of Layer-3 products are already available?
- A: An increasing number of applications benefit from the advanced features
- of MPEG audio Layer-3. Here is a list of companies that currently sell
- Layer-3 products. For further informations, please contact these companies
- directly.
-
- Layer-3 Codecs for Telecommunication:
- - AETA, 361 Avenue du Gal de Gaulle (*)
- F-92140 Clamart, France
- Fax: +33-1-4136-1213 (Mr. Fric)
- (*) products announced for 1995
- - Dialog 4 System Engineering GmbH, Monreposstr. 57
- D-71634 Ludwigsburg, Germany
- Fax: +49-7141-22667 (Mr. Burkhardtsmaier)
- - PKI Philips Kommunikations Industrie, Thurn-und-Taxis-Str. 14
- D-90411 Nuernberg, Germany
- Fax: +49-911-526-3795 (Mr. Konrad)
- - Telos Systems, 2101 Superior Avenue
- Cleveland, OH 44114, USA
- Fax: +1-216-241-4103 (Mr. Church)
-
- Speech Announcement Systems:
- - Meister Electronic GmbH, Koelner Str. 37
- D-51149 Koeln, Germany
- Fax: +49-2203-1701-30 (Mr. Seifert)
-
- PC Cards (Hardware and/or Software):
- - Dialog 4 System Engineering GmbH, Monreposstr. 57
- D-71634 Ludwigsburg, Germany
- Fax: +49-7141-22667 (Mr. Burkhardtsmaier)
- - Proton Data, Marrensdamm 12 b
- D-24944 Flensburg, Germany
- Fax: +49-461-38169 (Mr. Nissen)
-
- Layer-3-Decoder-Chips:
- - ITT Intermetall GmbH, Hans-Bunte-Str. 19
- D-79108 Freiburg, Germany
- Fax: +49-761-517-2395 (Mrs. Mayer)
-
- Layer-3 Shareware Encoder/Decoder:
- - Mailbox System Nuernberg (MSN), Innerer Kleinreuther Weg 21
- D-90408 Nuernberg, Germany
- Fax: +49-911-9933661 (Mr. Hanft)
- Shareware (version 1.50) is available for:
- - IBM-PCs or Compatibles with MS-DOS:
- L3ENC.EXE and L3DEC.EXE should work on practically
- any PC with 386 type CPU or better. For the encoder, a
- 486DX33 or better is recommended.
- On a 486DX2/66 the current shareware decoder performs in
- 1:3 real-time, and the shareware encoder in 1:14 real-time
- (with stereo signals sampled with 44.1 kHz).
- - Sun workstations:
- On a SPARC station 10, the decoder works in real time, the
- encoder performs in 1:5 real-time.
- For more information, refer to chapter 6.
-
- 5. Support by Fraunhofer-IIS
-
- Q: I understand that Fraunhofer-IIS has been the main developer of MPEG
- audio Layer-3. What can they do for me?
- A: The Fraunhofer-IIS focusses on applied research. Its engineers have
- profound expertise in real-time implementations of signal-processing
- algorithms, especially of Layer-3. The IIS may support a specific Layer-3
- application in various ways:
- - detailed informations
- - technical consulting
- - advanced C sources for encoder and decoder
- - training-on-the-job
- - research and development projects on contract basis.
- For more informations, feel free to contact:
- - Fraunhofer-IIS, Weichselgarten 3
- D-91058 Erlangen, Germany
- Fax: +49-9131-776-399 (Mr. Popp)
-
- Q: What are the latest audio demonstrations disclosed by Fraunhofer-IIS?
- A: At the Tonmeistertagung 11.94 in Karlsruhe, Germany, the IIS
- demonstrated:
- - real-time Layer-3 decoder software (mono, 32 kHz fs) including sound
- output on ProAudioSpectrum running on a 486DX2/66
- - playback of Layer-3 stereo files from a CD-ROM that has been produced
- by Intermetall and contains Layer-3 data of up to 15 h of stereo music
- (among others, all Beethoven symphonies); the decoder is a small board
- that is connected to the parallel printer port. It mainly carries 3 chips: a
- PLD as data interface, the MAS 3503 C stereo decoder chip, and the
- ASCO Digital-Analog-Converter. The board has two cinch adapters that
- allow a very simple connection to the usual stereo amplifier.
- - music-from-silicon demonstration by using the standard 1 Mbyte
- EPROMs to store 1.5 minutes of CD-like quality stereo music
- - music link (with around 6 kHz bandwidth) via V.34 modem at 28.8 kbps
- and one analog phone line
-
- 6. Shareware Information
-
- The Layer 3 Shareware is copyright Fraunhofer - IIS 1994,1995.
- The shareware packages are available:
- - via anonymous ftp from fhginfo.fhg.de (153.96.1.4)
- You may download our Layer-3 audio software package from the directory
- /pub/layer3. You will find the following files:
-
- For IBM PCs:
- l3v150d1.txt a short description of the files found in l3v150.zip
- l3v150d1.zip encoder, decoder and documentation
- l3v150d2.txt a short description of the files found in l3v150n.zip
- l3v150d2.zip sample bitstreams
-
- For SUN workstations:
- l3v150.sun.txt short description of the files found in
- l3v100.sun.tar.gz
- l3v150.sun.tar.gz encoder, decoder and documentation
- l3v150bit.sun.txt short description of the files found in
- l3v150bit.sun.tar.gz
- l3v150bit.sun.tar.gz sample bitstreams
-
- - via direct modem download (up to 14.400 bps)
- Modem telephone number : +49 911 9933662 Name: FHG
- Packet switching network: (0) 262 45 9110 10290 Name: FHG
- (For the telephone number, replace "+" with your appropriate international
- dial prefix, e.g. "011" for the USA.)
- Follow the menus as desired.
-
- - via shipment of diskettes (only including registration)
- You may order a diskette directly from:
- Mailbox System Nuernberg (MSN)
- Hanft & Hartmann
- Innerer Kleinreuther Weg 21
- D-90408 Nuernberg, Germany
-
- Please note: MSN will only ship a diskette if they get paid for the
- registration fee before. The registration fee is 85 Deutsche Mark (about 50
- US$) (plus sales tax, if applicable) for one copy of the package. The
- preferred method of payment is via credit card. Currently, MSN accepts
- VISA, Master Card / Eurocard / Access credit cards. For details see the file
- REGISTER.TXT found in the shareware package.
-
- You may reach MSN also via Internet: msn@iis.fhg.de
- or via Fax: +49 911 9933661
- or via BBS: +49 911 9933662 Name: FHG
- or via X25: 0262 45 9110 10290 Name: FHG
- (e.g. in USA, please replace "+" with "011"
-
- - via email
- You may get our shareware also by a direct request to msn@iis.fhg.de. In
- this case, the shareware is split into about 30 small uuencoded parts...
-
- SOFTWARE: MPEG Audio Layer 3 Shareware Codec and Windows Realtime Player
-
- ----------------------------------------------------------------
- MPEG Audio Codec and Windows REALTIME Player from Fraunhofer IIS
- ----------------------------------------------------------------
-
- Fraunhofer IIS announces l3enc/l3dec V2.00 and WinPlay3 V1.00.
-
- For high quality audio compression, the shareware l3enc/l3dec V2.00
- package is available for Linux, SUN, NeXT and DOS on
- <URL:ftp://ftp.fhg.de/pub/layer3>
- Versions for SGI and HP will follow soon.
-
- The shareware package for DOS
-
- <URL:ftp://ftp.fhg.de/pub/layer3/l3v200d1.zip>
-
- includes a demo version of WinPlay3, a Windows MPEG Audio Layer 3
- realtime-player.
-
- With MPEG Audio Layer 3 you can get a 12:1 compression with a CD like
- quality.
- Instead of 12 MByte / minute (stereo 44.1 kHz) you only need about
- 1 Mbyte / minute!
-
- More information can be found on
- <URL:ftp://ftp.fhg.de/pub/layer3/MPEG_Audio_L3_FAQ.html>
- or contact <URL:mailto:layer3@iis.fhg.de>
-
- - via direct modem download (up to 14.400 bps)
- Modem telephone number : +49 911 9933662 Name: FHG
- Packet switching network: (0) 262 45 9110 10290 Name: FHG
- (For the telephone number, replace "+" with your appropriate international
- dial prefix, e.g. "011" for the USA.)
- Follow the menus as desired.
-
- - via shipment of diskettes (only including registration)
- You may order a diskette directly from:
- Mailbox System Nuernberg (MSN)
- Hanft & Hartmann
- Innerer Kleinreuther Weg 21
- D-90408 Nuernberg, Germany
-
- Please note: MSN will only ship a diskette if they get paid for the
- registration fee before. The registration fee is 85 Deutsche Mark (about 50
- US$) (plus sales tax, if applicable) for one copy of the package. The
- preferred method of payment is via credit card. Currently, MSN accepts
- VISA, Master Card / Eurocard / Access credit cards. For details see the file
- REGISTER.TXT found in the shareware package.
-
- You may reach MSN also via Internet: msn@iis.fhg.de
- or via Fax: +49 911 9933661
- or via BBS: +49 911 9933662 Name: FHG
- or via X25: 0262 45 9110 10290 Name: FHG
- (e.g. in USA, please replace "+" with "011"
-
- - via email
- You may get our shareware also by a direct request to msn@iis.fhg.de. In
- this case, the shareware is split into about 30 small uuencoded parts...
-
- Harald Popp
- Audio & Multimedia ("Music is the *BEST*" - F. Zappa)
- Fraunhofer-IIS-A, Weichselgarten 3, D-91058 Erlangen, Germany
- Phone: +49-9131-776-340
- Fax: +49-9131-776-399
- email: popp@iis.fhg.de
- P.S.: Look out for planetoid #3834!
-
- -------------------------------------------------------------------------------
-
- ~Subject: What is MPEG-1+ ?
-
- This was a little mail-talk between harti@harti.de (Stefan Hartmann)
- and hgordon@system.xingtech.com.
-
- Q: What is MPEG-1+ ?
-
- It's MPEG-1 at MPEG-2 (CCIR) resolution. It will maybe be used
- fir TV-on-top-boxes for broadcasting or video-on-demand projects
- to enhance the picture quality.
-
- Q: I see. Is this a new standard ?
-
- No. MPEG-1 allows the definition of frames until 4000x4000 pixel, but
- that is usally not used.
-
- Q; So what's different ?
-
- I understand that the effective resolution is approximately 550 x 480.
- Typical datarates are 3.5Mbps - 5.5Mbps (sports programming and perhaps
- movies are higher).
-
- Q: Is the video quality lower than with real MPEG-2 movies ?
-
- The quality is better than cable TV, and in my area, we don't have cable.
- They de-interlace and compress the full frames. My understanding is that
- this is about 5%-10% less efficient than taking advantage of MPEG-2
- interfield motion vectors.
-
- Q: If the fields are deinterlaced, do you see the interlace artifacts, so that
- a moving object in one field is already more into one direction, than in the
- other field ?
-
- Probably the TV-receiver also gives it out interlaced again to the TV-
- set, so this does not produce this interlace artifact like on
- PCs with live video windows displaing both fields....
-
- Q: Can you record this anyhow on a VCR ? Does the SAT-Receiver have a
- video- output, so you can record movies to tape ?
-
- You should be able to record to tape, though they may have some record
- blocking hardware which has to be overcome with video stabilizing
- hardware.
-
- Q: What kind of realtime encoders do they use at the broadcast station ?
-
- CLI (Compression Labs) is the manufacturer, using C-Cube chipsets (10
- CL-4000's per MPEG-1+ encoder).
-
- Q: Is there any written info about this MPEG-1 Plus technology available on
- the net ?
-
- Not that I'm aware. Maybe C-Cube has a Web site.
-
-
- [So it's up to you, dear reader, to find more and to tell me where it is ;o) ]
-
- Frank Gadegast, phade@powerweb.de
-
- -------------------------------------------------------------------------------
-
- ~Subject: What is MPEG-2?
-
- MPEG-2 FAQ
- version 3.7 (May 11, 1995)
- by Chad Fogg (cfogg@chromatic.com)
-
- The MPEG (Moving Pictures Experts Group) committee began its life in
- late 1988 by the hand of Leonardo Chairiglione and Hiroshi Yasuda with
- the immediate goal of standardizing video and audio for compact discs.
- Over the next few years, participation amassed from international
- technical experts in the areas of Video, Audio, and Systems, reaching
- over 200 participants by 1992.
-
- By the end of the third year (1990), a syntax emerged, which when
- applied to code SIF video and compact disc audio samples rates at a
- combined coded bitrate of 1.5 Mbit/sec, approximated the perceptual
- quality of consumer video tape (VHS). After demonstrations proved that
- the syntax was generic enough to be applied to bit rates and sample
- rates far higher than the original primary target application, a second
- phase (MPEG-2) was initiated within the committee to define a syntax
- for efficient representation of broadcast video. Efficient
- representation of interlaced (broadcast) video signals was more
- challenging than the progressive (non-interlaced) signals coded by
- MPEG-1. Similarly, MPEG-1 audio was capable of only directly
- representing two channels of sound. MPEG-2 would introduce a scheme to
- decorrelate mutlichannel discrete surround sound audio.
-
- Need for a third phase (MPEG-3) was anticipated in 1991 for High
- Definition Television, although it was later discovered by late 1992
- and 1993 that the MPEG-2 syntax simply scaled with the bit rate,
- obviating the third phase. MPEG-4 was launched in late 1992 to explore
- the requirements of a more diverse set of applications, while finding a
- more efficient means of coding low bit rate/low sample rate video and
- audio signals.
-
- Today, MPEG (video and systems) is exclusive syntax of the United
- States Grand Alliance HDTV specification, the European Digital Video
- Broadcasting Group, and the high density compact disc (lead by rivals
- Sony/Philips and Toshiba).
-
- What is MPEG video syntax ?
-
- MPEG video syntax provides an efficient way to represent image
- sequences in the form of more compact coded data. The language of the
- coded bits is the syntax. For example, a few tokens can represent an
- entire block of 64 samples. MPEG also describes a decoding
- (reconstruction) process where the coded bits are mapped from the
- compact representation into the original, raw format of the image
- sequence. For example, a flag in the coded bitstream signals whether
- the following bits are to be decoded with a DCT algorithm or with a
- prediction algorithm. The algorithms comprising the decoding process
- are regulated by the semantics defined by MPEG. This syntax can be
- applied to exploit common video characteristics such as spatial
- redundancy, temporal redundancy, uniform motion, spatial masking, etc.
-
- MPEG Myths
-
- A brief summary myths.
-
- 1. Compression Ratios over 100:1
-
- Articles in the press and marketing literature will often make the
- claim that MPEG can achieve high quality video with compression ratios
- over 100:1. These figures often include the oversampling factors in
- the source video. In reality, the coded sample rate specified in an
- MPEG image sequence is usually not much larger than 30 times the
- specified bit rate. Pre-compression through subsampling is chiefly
- responsible for 3 digit ratios for all video coding methods, including
- those of the non-MPEG variety.
-
- 2. MPEG-1 is 352x240
-
- Both MPEG-1 and MPEG-2 video syntax can be applied at a wide range of
- bitrates and sample rates. The MPEG-1 that most people are familiar
- with has parameters of 30 SIF pictures (352 pixels x 240 lines) per
- second and a bitrate less than 1.86 megabits/sec----a combination
- known as "Constrained Parameters Bitstreams". This popular
- interoperability point is promoted by Compact Disc Video (White Book).
- In fact, it is syntactically possible to encode picture dimensions as
- high as 4095 x 4095 and a bitrates up to 100 Mbit/sec. With the advent
- of the MPEG-2 specification, the most popular combinations have
- coagulated into Levels, which are described later in this text. The
- two most common are affectionately known as SIF (e.g. 352 pixels x 240
- lines x 30 frames/sec), or Low Level, and CCIR 601 (e.g. 720
- pixels/line x 480 lines x 30 frames/sec), or Main Level.
-
- 3. Motion Compensation displaces macroblocks from previous pictures
-
- Macroblock predictions are formed out of arbitrary 16x16 pixel (or 16x8
- in MPEG-2) areas from previously reconstructed pictures. There are no
- boundaries which limit the location of a macroblock prediction within
- the previous picture, other than the edges of the picture.
-
-
- 4. Display picture size is the same as the coded picture size
-
- In MPEG, the display picture size and frame rate may differ from the
- size (resolution) and frame rate encoded into the bitstream. For
- example, a regular pattern of pictures in a source image sequence may
- be dropped (decimated), and then each picture may itself be filtered
- and subsampled prior to encoding. Upon reconstruction, the picture may
- be interpolated and upsampled back to the source size and frame rate.
- In fact, the three fundamental phases (Source Rate, Coded Rate, and
- Display Rate) may differ by several parameters. The MPEG syntax can
- separately describe Coded and Display Rates through sequence_headers,
- but the Source Rate is known only by the encoder.
-
-
- 5. Picture coding types (I, P, B) all consist of the same macroblocks types.
-
- All macroblocks within an I picture must be coded Intra (like a
- baseline JPEG picture). However, macroblocks within a P picture may
- either be coded as Intra or Non-intra (temporally predicted from a
- previously reconstructed picture). Finally, macroblocks within the B
- picture can be independently selected as either Intra, Forward
- predicted, Backward predicted, or both forward and backward
- (Interpolated) predicted. The macroblock header contains an element,
- called macroblock_type, which can flip these modes on and off like
- switches. macroblock_type is possibly the single most powerful element
- in the whole of video syntax. Picture types (I, P, and B) merely enable
- macroblock modes by widening the scope of the semantics. The component
- switches are:
-
- 1. Intra or Non-intra
- 2. Forward temporally predicted (motion_forward)
- 3. Backward temporally predicted (motion_backward)
- (2+3 in combination represent ôInterpolatedö)
- 4. conditional replenishment (macroblock_pattern).
- 5. adaptation in quantization (macroblock_quantizer).
- 6. temporally predicted without motion compensation
-
- The first 5 switches are mostly orthogonal (the 6th is derived from the
- 1st and 2nd in P pictures, and does not exist in B pictures). Some
- switches are non-applicable in the presence of others. For example, in
- an Intra macroblock, all 6 blocks by definition contain DCT data,
- therefore there is no need to signal either the macroblock_pattern or
- any of the temporal prediction switches. Likewise, when there is no
- coded prediction error information in a Non-intra macroblock, the
- macroblock_quantizer signal would have no meaning.
-
-
- 6. Sequence structure is fixed to a specific I,P,B frame pattern.
-
- A sequence may consist of almost any pattern of I, P, and B pictures
- (there are a few minor semantic restrictions on their placement). It
- is common in industrial practice to have a fixed pattern (e.g.
- IBBPBBPBBPBBPBB), however, more advanced encoders will attempt to
- optimize the placement of the three picture types according to local
- sequence characteristics in the context of more global
- characteristics. Each picture type carries a penalty when coupled with
- the statistics of a particular picture (temporal masking, occlusion,
- motion activity, etc.).
-
- The variable length codes of the macroblock_type switch provide a
- direct clue, but it is the full scope of semantics of each picture type
- spell out the costs-benefits. For example, if the image sequence
- changes little from frame-to-frame, it is sensible to code more B
- pictures than P. Since B pictures by definition are never fed back
- into the prediction loop (i.e. not used as prediction for future
- pictures), bits spent on the picture are wasted in a sense (B pictures
- are like temporal spackle). Application requirements also govern
- picture type placement: random access points, mismatch/drift reduction,
- channel hopping, program indexing, and error recovery & concealment.
-
-
- The 6 Steps to Claiming Bogously High Compression Ratios:
-
- MPEG video is often quoted as achieving compression ratios over 100:1,
- when in reality the sweet spot rests between 8:1 and 30:1.
-
- Heres how the fabled greater than 100:1 reduction ratio is derived for
- the popular Compact Disc Video (White Book) bitrate of 1.15 Mbit/sec.
-
- Step 1. Start with the oversampled rate
-
- Most MPEG video sources originate at a higher sample rate than the
- "target sample rate encoded into the final MPEG bitstream. The most
- popular studio signal, known canonically as D-1 or CCIR 601 digital
- video, is coded at 270 Mbit/sec.
-
- The constant, 270 Mbit/sec, can be derived as follows:
-
- Luminance (Y): 858 samples/line x 525 lines/frame x 30 frames/sec x
- 10 bits/sample ~= 135 Mbit/sec
-
- R-Y (Cb): 429 samples/line x 525 lines/frame x 30 frames/sec x
- 10 bits/sample ~= 68 Mbit/sec
-
- B-Y (Cb): 429 samples/line x 525 lines/frame x 30 frames/sec x
- 10 bits/sample ~= 68 Mbit/sec
-
- Total: 27 million samples/sec x 10 bits/sample = 270 Mbit/sec.
-
- So, our compression ratio is: 270/1.15... an amazing 235:1 !!
-
-
- Step 2. Include blanking intervals
-
- Only 720 out of the 858 luminance samples per line contain active
- picture information. In fact, the debate over the true number of
- active samples is the cause of many hair-pulling cat-fights at TV
- engineering seminars and conventions, so it is safer to say that the
- number lies somewhere between 704 and 720. Likewise, only 480 lines
- out of the 525 lines contain active picture information. Again, the
- actual number is somewhere between 480 and 496. For the purposes of
- MPEG-1s and MPEG-2s famous conformance points (Constrained Parameters
- Bitstreams and Main Level, respectively), the number shall be 704
- samples x 480 lines for luminance, and 352 samples x 480 lines for each
- of the two chrominance pictures. Recomputing the source rate, we arrive
- at:
-
- (luminance)
- 704 samples/line x 480 lines x 30 fps x 10 bits/sample ~= 104 Mbit/sec
-
- (chrominance)
- 2 components x 352 samples/line x 480 lines x 30 fps x 10 bits/sample
- ~= 104 Mbit/sec
-
- Total: ~ 207 Mbit/sec
-
- The ratio (207/1.15) is now only 180:1
-
-
- Step 3. Include higher bits/sample
-
- The MPEG sample precision is 8 bits. Studio equipment often quantize
- samples with 10 bits of accuracy. The 2-bit improvement to the dynamic
- range is considered useful for suppressing noise in multi-generation
- video.
-
- The ratio is now only 180 * (8/10 ), or 144:1
-
- Step 4. Include higher chroma ratio
-
- The famous CCIR-601studio signal represents the chroma signals (Cb, Cr)
- with half the horizontal sample density as the luminance signal, but
- with full vertical resolution. This particular ratio of subsampled
- components is known as 4:2:2. However, MPEG-1 and MPEG-2 Main Profile
- specify the exclusive use of the 4:2:0 format, deemed sufficient for
- consumer applications, where both chrominance signals have exactly half
- the horizontal and vertical resolution as luminance (the MPEG Studio
- Profile, however, centers around the 4:2:2 macroblock structure). Seen
- from the perspective of pixels being comprised of samples from multiple
- components, the 4:2:2 signal can be expressed as having an average of 2
- samples per pixel (1 for Y, 0.5 for Cb, and 0.5 for Cr). Thanks to the
- reduction in the vertical direction (resulting in a 352 x 240
- chrominance frame), the 4:2:0 signal would, in effect, have an average
- of 1.5 samples per pixel (1 for Y, and 0.25 for Cb and Cr each). Our
- source video bit rate may now be recomputed as:
-
- 720 pixels x 480 lines x 30 fps x 8 bits/sample x 1.5 samples/pixel
- = 124 Mbit/sec
-
- ... and the ratio is now 108:1.
-
- Step 5. Include pre-subsampled image size
-
- As a final act of pre-compression, the CCIR 601 frame is converted to
- the SIF frame by a subsampling of 2:1 in both the horizontal and
- vertical directions.... or 4:1 overall. Quality horizontal subsampling
- can be achieved by the application of a simple FIR filter (7 or 4 taps,
- for example), and vertical subsampling by either dropping every other
- field (in effect, dropping every other line) or again by an FIR filter
- (regulated by an interfield motion detection algorithm). Our ratio now
- becomes:
-
- 352 pixels x 240 lines x 30 fps x 8 bits/sample x 1.5 samples/pixel
- ~= 30 Mbit/sec !!
-
- .. and the ratio is now only 26:1
-
- Thus, the true A/B comparison should be between the source sequence at
- the 30 Mbit/sec stage, the actual specified sample rate in the MPEG
- bitstream, and the reconstructed sequence produced from the 1.15
- Mbit/sec coded bitstream.
-
- Step 6. DonÆt forget the 3:2 pulldown
-
- A majority of high-end programs originates from film. Most of the
- movies encoded onto Compact Disc Video were in captured and reproduced
- at 24 frames/sec. So, in such an image sequence, 6 out of the 30
- frames every second are in fact redundant and need not be coded into
- the MPEG bitstream, leading to the shocking discovery that the actual
- soure bit rate has really been 24 Mbit/sec all along, and the
- compression ratio a mere 21:1 !!! Even at the seemingly modest 20:1
- ratio, discrepancies will appear between the 24 Mbit/sec source
- sequence and the reconstructed sequence. Only conservative ratios in
- the neighborhood of 8:1 have demonstrated true transparency for
- sequences with complex spatial-temporal characteristics (i.e. rapid,
- divergent motion and sharp edges, textures, etc.). However, if the
- video is carefully encoded by means of pre-processing and intelligent
- distribution of bits, higher ratios can be made to appear at least
- artifact-free.
-
-
- What are the parts of the MPEG document?
-
- The MPEG-1 specification (official title: ISO/IEC 11172 Information
- technology Coding of moving pictures and associated audio for digital
- storage media at up to about 1.5 Mbit/s, Copyright 1993.) consists of
- five parts. Each document is a part of the ISO/IEC number 11172. The
- first three parts reached International Standard in 1993. Part 4
- reached IS in 1994. In mid 1995, Part 5 will go IS.
-
- Part 1---Systems: The first part of the MPEG standard has two primary
- purposes: 1). a syntax for transporting packets of audio and video
- bitstreams over digital channels and storage mediums (DSM), 2). a
- syntax for synchronizing video and audio streams.
-
- Part 2---Video: describes syntax (header and bitstream elements) and
- semantics (algorithms telling what to do with the bits). Video breaks
- the image sequence into a series of nested layers, each containing a
- finer granularity of sample clusters (sequence, picture, slice,
- macroblock, block, sample/coefficient). At each layer, algorithms are
- made available which can be used in combination to achieve efficient
- compression. The syntax also provides a number of different means for
- assisting decoders in synchronization, random access, buffer
- regulation, and error recovery. The highest layer, sequence, defines
- the frame rate and picture pixel dimensions for the encoded image
- sequence.
-
- Part 3---Audio: describes syntax and semantics for three classes of
- compression methods. Known as Layers I, II, and III, the classes trade
- increased syntax and coding complexity for improved coding efficiency
- at lower bitrates. The Layer II is the industrial favorite, applied
- almost exclusively in satellite broadcasting (Hughes DSS) and compact
- disc video (White Book). Layer I has similarities in terms of
- complexity, efficiency, and syntax to the Sony MiniDisc and the Philips
- Digitial Compact Cassette (DCC). Layer III has found a home in ISDN,
- satellite, and Internet audio applications. The sweet spots for the
- three layers are 384 kbit/sec (DCC), 224 kbit/sec (CD Video, DSS), and
- 128 Kbits/sec (ISDN/Internet), respectively.
-
- Part 4---Conformance: (circa 1992) defines the meaning of MPEG
- conformance for all three parts (Systems, Video, and Audio), and
- provides two sets of test guidelines for determining compliance in
- bitstreams and decoders. MPEG does not directly address encoder
- compliance.
-
- Part 5---Software Simulation: Contains an example ANSI C language
- software encoder and compliant decoder for video and audio. An
- example systems codec is also provided which can multiplex and
- demultiplex separate video and audio elementary streams contained in
- computer data files.
-
-
- As of March 1995, the MPEG-2 volume consists of a total of 9 parts
- under ISO/IEC 13818. Part 2 was jointly developed with the ITU-T,
- where it is known as recommendation H.262. The full title is:
- Information Technology--Generic Coding of Moving Pictures and
- Associated Audio. ISO/IEC 13818. The first five parts are organized in
- the same fashion as MPEG-1(System, Video, Audio, Conformance, and
- Software). The four additional parts are listed below:
-
- Part 6 Digital Storage Medium Command and Control (DSM-CC): provides a
- syntax for controlling VCR- style playback and random-access of
- bitstreams encoded onto digital storage mediums such as compact disc.
- Playback commands include Still frame, Fast Forward, Advance, Goto.
-
- Part 7 Non-Backwards Compatible Audio (NBC): addresses the need for a
- new syntax to efficiently de- correlate discrete mutlichannel surround
- sound audio. By contrast, MPEG-2 audio (13818-3) attempts to code the
- surround channels as an ancillary data to the MPEG-1
- backwards-compatible Left and Right channels. This allows existing
- MPEG-1 decoders to parse and decode only the two primary channels while
- ignoring the side channels (parse to /dev/null). This is analogous to
- the Base Layer concept in MPEG-2 Scalable video. NBC candidates include
- non-compatible syntaxs such as Dolby AC-3. Final document is not
- expected until 1996.
-
- Part 8 10-bit video extension. Introduced in late 1994, this
- extension to the video part (13818-2) describes the syntax and
- semantics to coded representation of video with 10-bits of sample
- precision. The primary application is studio video (distribution,
- editing, archiving). Methods have been investigated by Kodak and
- Tektronix which employ Spatial scalablity, where the 8-bit signal
- becomes the Base Layer, and the 2-bit differential signal is coded as
- an Enhancement Layer. Final document is not expected until 1997 or
- 1998. [Part 8 will be withdrawn]
-
- <IMG SRC="mpeg2lay.gif">
-
- <IMG SRC="mpeg2la2.gif">
-
- Part 9 Real-time Interface (RTI): defines a syntax for video on demand
- control signals between set-top boxes and head-end servers.
-
- What is the evolution of an MPEG/ISO document?
-
- In chronological order:
-
- Abbr. ISO/Committee notation Author's notation
- ----- ------------------------------- -----------------------------
- - Problem (unofficial first stage) barroom witticism or dare
- NI New work Item Napkin Item
- NP New Proposal Need Permission
- WD Working Draft WeÆre Drunk
- CD Committee Draft Calendar Deadlock
- DIS Draft International Standard Doesn't Include Substance
- IS International Standard Induced patent Statements
-
-
- Introductory paper to MPEG?
-
- Didier Le Gall, "MPEG: A Video Compression Standard for Multimedia
- Applications," Communications of the ACM, April 1991, Vol.34, No.4, pp.
- 47-58
-
-
- MPEG in periodicals?
-
- The following journals and conferences have been known to contain
- information relating to MPEG:
-
-
- IEEE Transactions on Consumer Electronics
- IEEE Transactions on Broadcasting
- IEEE Transactions on Circuits and Systems for Video Technology
- Advanced Electronic Imaging
- Electronic Engineering Times (EE Times)
- IEEE Int'l Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- International Broadcasting Convention (IBC)
- Society of Motion Pictures and Television Engineers Journal (SMPTE)
- SPIE conference on Visual Communications and Image Processing
-
-
- MPEG Book?
-
- Several MPEG books are under development.
-
- An MPEG book will be produced by the same team behind the JPEG book:
- Joan Mitchell and Bill Pennebaker.... along with Didier Le Gall. It is
- expected to be a tutorial on MPEG-1 video and some MPEG-2 video. Van
- Nostran Reinhold in 1995.
-
- A book, in the Japanese language, has already been published (ISBN:
- 4-7561-0247-6). The title is called MPEG by ASCII publishing.
-
- Keith Jack's second edition of Video Demystified, to be published in
- August 1995, will feature a large chapter on MPEG video. Information:
- ftp://ftp.pub.netcom/pub/kj/kjack/
-
-
-
- MPEG is a DCT based scheme?
-
- The DCT and Huffman algorithms receive the most press coverage (e.g.
- "MPEG is a DCT based scheme with Huffman coding"), but are in fact less
- significant when compared to the variety of coding modes signaled to
- the decoder as context-dependent side information. The MPEG-1 and
- MPEG-2 IDCT has the same definition as H.261, H.263, JPEG.
-
-
- What are constant and variable bitrate streams?
-
- Constant bitrate streams are buffer regulated to allow continuos
- transfer of coded data across a constant rate channel without causing
- an overflow or underflow to a buffer on the receiving end. It is the
- responsibility of the Encoders Rate Control stage to generate
- bitstreams which prevent buffer overflow and underflow. The constant
- bit rate encoding can be modeled as a reservoir: variable sized coded
- pictures flow into the bit reservoir, but the reservoir is drained at a
- constant rate into the communications channel. The most challenging
- aspect of a constant rate encoder is, yes, to maintain constant channel
- rate (without overflowing or underflow a buffer of a fixed depth) while
- maintaining constant perceptual picture quality.
-
- In the simplest form, variable rate bitstreams do not obey any buffer
- rules, but will maintain constant picture quality. Constant picture
- quality is easiest to achieve by holding the macroblock quantizer step
- size constant (e.g. level 16 of 31). In its most advanced form, a
- variable bitrate stream may be more difficult to generate than
- constant bitrate streams. In advanced variable bitrate streams, the
- instantaneous bit rate (piece-wise bit rate) may be controlled by
- factors such as: 1. local activity measured against activity over
- large time intervals (e.g. the full span of a movie), or 2.
- instantaneous bandwidth availability of a communications channel.
-
- Summary of bitstream types
- Bitrate type
- Applications
-
- constant-rate
- fixed-rate communications channels like the original Compact Disc,
- digital video tape, single channel-per-carrier broadcast signal, hard
- disk storage
-
- simple variable-rate
- software decoders where the bitstream buffer (VBV) is the storage
- medium itself (very large). macroblock quantization scale is typically
- held constant over large number of macroblocks.
-
- complex variable-rate
- Statistical muliplexing (multiple-channel-per-carrier broadcast
- signals), compact discs and hard disks where the servo mechanisms can
- be controlled to increase or decrease the channel delivery rate,
- networked video where overall channel rate is constant but demand is
- variably share by multiple users, bitstreams which achieve average
- rates over very long time averages
-
-
-
- What is statistical multiplexing ?
-
- Progressive explanation:
- In the simplest coded bitstream, a PCM (Pulse Coded Modulated) digital
- signal, all samples have an equal number of bits. Bit distribution in a
- PCM image sequence is therefore not only uniform within a picture,
- (bits distributed along zero dimensions), but is also uniform across
- the full sequence of pictures.
-
- Audio coding algorithms such as MPEG-1s Layer I and II are capable of
- distributing bits over a one dimensional space, spanned by a frame. In
- layer II, for example, an audio channel coded at a bitrate of 128
- bits/sec and sample rate of 44.1 Khz will have frames (which consist of
- 1152 subband coefficients each) coded with approximately 334 bits.
- Some subbands will receive more bits than others.
-
- In block-based still image compression methods which employ 2-D
- transform coding methods, bits are distributed over a 2 dimensional
- space (horizontal and vertical) within the block. Further, blocks
- throughout the picture may contain a varying number of bits as a
- result, for example, of adaptive quantization. For example, background
- sky may contain an average of only 50 bits per block, whereas complex
- areas containing flowers or text may contain more than 200 bits per
- block. In the typical adaptive quantization scheme, more bits are
- allocated to perceptually more complex areas in the picture. The
- quantization stepsizes can be selected against an overall picture
- normalization constant, to achieve a target bit rate for the whole
- picture. An encoder which generates coded image sequences comprised of
- independently coded still pictures, such as JPEG Motion video or MPEG
- Intra picture sequences, will typically generate coded pictures of
- equal bit size.
-
- MPEG non-intra coding introduces the concept of the distribution of
- bits across multiple pictures, augmenting the distribution space to 3
- dimensions. Bits are now allocated to more complex pictures in the
- image sequence, normalized by the target bit size of the group of
- pictures, while at a lower layer, bits within a picture are still
- distributed according to more complex areas within the picture. Yet in
- most applications, especially those of the Constant Bitrate class, a
- restriction is placed in the encoder which guarantees that after a
- period of time, e.g. 0.25 seconds, the coded bitstream achieves a
- constant rate (in MPEG, the Video Buffer Verifier regulates the
- variable-to-constant rate mapping). The mapping of an inherently
- variable bitrate coded signal to a constant rate allows consistent
- delivery of the program over a fixed-rate communications channel.
-
- Statistical multiplexing takes the bit distribution model to 4
- dimensions: horizontal, vertical, temporal, and program axis. The 4th
- dimension is enabled by the practice of mulitplexing multiple programs
- (each, for example, with respective video and audio bitstreams) on a
- common data carrier. In the Hughes' DSS system, a single data carrier
- is modulated with a payload capacity of 23 Mbits/sec, but a typical
- program will be transported at average bit rate of 6 Mbit/sec each. In
- the 4-D model, bits may be distributed according the relative
- complexity of each program against the complexities of the other
- programs of the common data carrier. For example, a program undergoing
- a rapid scene change will be assigned the highest bit allocation
- priority, whereas the program with a near-motionless scene will receive
- the lowest priority, or fewest bits.
-
-
- How does MPEG achieve compression?
-
- Here are some typical statistical conditions addressed by specific
- syntax and semantic tools:
-
- 1. Spatial correlation: transform coding with 8x8 DCT.
-
- 2. Human Visual Response---less acuity for higher spatial frequencies:
- lossy scalar quantization of the DCT coefficients.
-
- 3. Correlation across wide areas of the picture: prediction of the DC
- coefficient in the 8x8 DCT block.
-
- 4. Statistically more likely coded bitstream elements/tokens: variable
- length coding of macroblock_address_increment, macroblock_type,
- coded_block_pattern, motion vector prediction error magnitude, DC
- coefficient prediction error magnitude.
-
-
- 5. Quantized blocks with sparse quantized matrix of DCT coefficients:
- end_of_block token (variable length symbol).
-
- 6. Spatial masking: macroblock quantization scale factor.
-
- 7. Local coding adapted to overall picture perception (content
- dependent coding): macroblock quantization scale factor.
-
- 8. Adaptation to local picture characteristics: block based coding,
- macroblock_type, adaptive quantization.
-
- 9. Constant stepsizes in adaptive quantization: new quantization scale
- factor signaled only by special macroblock_type codes. (adaptive
- quantization scale not transmitted by default).
-
- 10. Temporal redundancy: forward, backwards macroblock_type and motion
- vectors at macroblock (16x16) granularity.
-
- 11. Perceptual coding of macroblock temporal prediction error: adaptive
- quantization and quantization of DCT transform coefficients (same
- mechanism as Intra blocks).
-
- 12. Low quantized macroblock prediction error: No prediction error for
- the macroblock may be signaled within macroblock_type. This is the
- macroblock_pattern switch.
-
- 13. Finer granularity coding of macroblock prediction error: Each of
- the blocks within a macroblock may be coded or not coded. Selective
- on/off coding of each block is achieved with the separate
- coded_block_pattern variable-length symbol, which is present in the
- macroblock only of the macroblock_pattern switch has been set.
-
- 14. Uniform motion vector fields (smooth optical flow fields):
- prediction of motion vectors.
-
- 15. Occlusion: forwards or backwards temporal prediction in B
- pictures. Example: an object becomes temporarily obscured by another
- object within an image sequence. As a result, there may be an area of
- samples in a previous picture (forward reference/prediction picture)
- which has similar energy to a macroblock in the current picture (thus
- it is a good prediction), but no areas within a future picture
- (backward reference) are similar enough. Therefore only forwards
- prediction would be selected by macroblock type of the current
- macroblock. Likewise, a good prediction may only be found in a future
- picture, but not in the past. In most cases, the object, or
- correlation area, will be present in both forward and backward
- references. macroblock_type can select the best of the three
- combinations.
-
- 16. Sub-sample temporal prediction accuracy: bi-linearly interpolated
- (filtered) "half-pel" block predictions. Real world motion
- displacements of objects (correlation areas) from picture-to-picture do
- not fall on integer pel boundaries, but on irrational . Half-pel
- interpolation attempts to extract the true object to within one order
- of approximation, often improving compression efficiency by at least 1
- dB.
-
- 17. Limited motion activity in P pictures: skipped macroblocks. When
- the motion vector is zero for both the horizontal and vertical vector
- components, and no quantized prediction error for the current
- macroblock is present. Skipped macroblocks are the most desirable
- element in the bitstream since they consume no bits, except for a
- slight increase in the bits of the next non-skipped macroblock.
-
- 18. Co-planar motion within B pictures: skipped macroblocks. When the
- motion vector is the same as the previous macroblocks, and no quantized
- prediction error for the current macroblock is present.
-
- What is the difference between MPEG-1 and MPEG-2 syntax?
-
- Section D.9 of ISO/IEC 13818-2 is an informative piece of text
- describing the differences between MPEG-1 and MPEG-2 video syntax. The
- following is a little more informal.
-
- Sequence layer:
- MPEG-2 can represent interlaced or progressive video sequences,
- whereas MPEG-1 is strictly meant for progressive sequences since the
- target application was Compact Disc video coded at 1.2 Mbit/sec.
-
- MPEG-2 changed the meaning behind the aspect_ratio_information
- variable, while significantly reducing the number of defined aspect
- ratios in the table. In MPEG-2, aspect_ratio_information refers to the
- overall display aspect ratio (e.g. 4:3, 16:9), whereas in MPEG-2, the
- ratio refers to the particular pixel. The reduction in the entries of
- the aspect ratio table also helps interoperability by limiting the
- number of possible modes to a practical set, much like frame_rate_code
- limits the number of display frame rates that can be represented.
- Optional picture header variables called display_horizontal_size and
- display_vertical_size can be used to code unusual display sizes.
-
- frame_rate_code in MPEG-2 refers to the intended display rate, whereas
- in MPEG-1 it referred to the coded frame rate. In film source video,
- there are often 24 coded frames per second. Prior to bitstream
- coding, a good encoder will eliminate the redundant 6 frames or 12
- fields from a 30 frame/sec video signal which encapsulates an
- inherently 24 frame/sec video source. The MPEG decoder or display
- device will then repeat frames or fields to recreate or synthesize the
- 30 frame/sec display rate. In MPEG-1, the decoder could only infer the
- intended frame rate, or derive it based on the Systems layer time
- stamps. MPEG-2 provides specific picture header variables called
- repeat_first_field and top_field_first which explicitly signal which
- frames or fields are to be repeated, and how many times.
-
- To address the concern of software decoders which may operate at rates
- lower or different than the common television rates, two new variables
- in MPEG-2 called frame_rate_extension_d and frame_rate_extension_n can
- be combined with frame_rate_code to specify a much wider variety of
- display frame rates. However, in the current set of define profiles
- and levels, these two variables are not allowed to change the value
- specified by frame_rate_code. Future extensions or Profiles of MPEG
- may enable them.
-
- In interlaced sequences, the coded macroblock height (mb_height) of a
- picture must be a multiple of 32 pixels, while the width, like MPEG-1,
- is a coded multiple of 16 pixels. A discrepancy between the coded
- width and height of a picture and the variables horizontal_size and
- vertical_size, respectively, occurs when either variable is not an
- integer multiple of macroblocks. All pixels must be coded within
- macroblocks, since there cannot be such a thing as fractional
- macroblocks. Never intended for display, these overhang pixels or
- lines exist along the left and bottom edges of the coded picture. The
- sample values within these trims can be arbitrary, but they can affect
- the values of samples within the current picture, and especially future
- coded pictures. In the current pictures, pixels which reside within
- the same 8x8 block as the overhang pixels are affect by the ripples of
- DCT quantization error. In future coded pictures, their energy can
- propagate anywhere within an image sequence as a result of motion
- compensated prediction. An encoder should fill in values which are
- easy to code, and should probably avoid creating motion vectors which
- would cause the Motion Compensated Prediction stage to extract samples
- from these areas. The application should probably select
- horizontal_size and vertical_size that are already multiples of 16 (or
- 32 in the vertical case of interlaced sequences) to begin with.
-
-
- Group of Pictures:
- The concept of the Group of Pictures layer does not exist in MPEG-2.
- It is an optional header useful only for establishing a SMPTE time code
- or for indicating that certain B pictures at the beginning of an edited
- sequence comprise a broken_link. This occurs when the current B
- picture requires prediction from a forward reference frame (previous in
- time to the current picture) has been removed from the bitstream by an
- editing process. In MPEG-1, the Group of Pictures header is mandatory,
- and must follow a sequence header.
-
-
- Picture layer:
- In MPEG-2, a frame may be coded progressively or interlaced, signaled
- by the progressive_frame variable. In interlaced frames
- (progressive_frame==0), frames may then be coded as either a frame
- picture (picture_structure==frame) or as two separately coded field
- pictures (picture_structure==top_field or
- picture_structure==bottom_field). Progressive frames are a logic
- choice for video material which originated from film, where all pixels
- are integrated or captured at the same time instant. Most electronic
- cameras today capture pictures in two separate stages: a top field
- consisting of all odd lines of the picture are nearly captured in the
- time instant, followed by a bottom field of all even lines. Frame
- pictures provide the option of coding each macroblock locally as either
- field or frame. An encoder may choose field pictures to save memory
- storage or reduce the end-to-end encoder-decoder delay by one field
- period.
-
-
- There is no longer such a thing called D pictures in MPEG-2 syntax.
- However, Main Profile @ Main Level MPEG-2 decoders, for example, are
- still required to decode D pictures at Main Level (e.g. 720x480x30
- Hz). The usefulness of D pictures, a concept from the year 1990, had
- evaporated by the time MPEG-2 solidified in 1993.
-
- repeat_first_field was introduced in MPEG-2 to signal that a field or
- frame from the current frame is to be repeated for purposes of frame
- rate conversion (as in the 30 Hz display vs. 24 Hz coded example
- above). On average in a 24 frame/sec coded sequence, every other coded
- frame would signal the repeat_first_field flag. Thus the 24 frame/sec
- (or 48 field/sec) coded sequence would become a 30 frame/sec (60
- field/sec) display sequence. This processes has been known for decades
- as 3:2 Pulldown. Most movies seen on NTSC displays since the advent of
- television have been displayed this way. Only within the past decade
- has it become possible to interpolate motion to create 30 truly unique
- frames from the original 24. Since the repeat_first_field flag is
- independently determined in every frame structured picture, the actual
- pattern can be irregular (it doesnt have to be every other frame
- literally). An irregularity would occur during a scene cut, for
- example.
-
-
- Slice:
- To aid implementations which break the decoding process into parallel
- operations along horizontal strips within the same picture, MPEG-2
- introduced a general semantic mandatory requirement that all
- macroblock rows must start and end with at least one slice. Since a
- slice commences with a start code, it can be identified by
- inexpensively parsing through the bitstream along byte boundaries.
- Before, an implementation might have had to parse all the variable
- length tokens between each slice (thereby completing a significant
- stage of decoding process in advance) to know the exact position of
- each macroblock within the bitstream. In MPEG-1, it was possible to
- code a picture with only a single slice. Naturally, the mandatory
- slice per macroblock row restriction also facilitates error recovery.
-
- MPEG-2 also added the concept of the slice_id. This optional 6-bit
- element signals which picture a particular slice belongs to. In badly
- mangled bitstreams, the location of the picture headers could become
- garbled. slice_id allows a decoder to place a slice in the proper
- location within a sequence. Other elements in the slice header, such
- as slice_vertical_position, and the macroblock_address_increment of the
- first macroblock in the slice uniquely identify the exact macroblock
- position of the slice within the picture. Thus within a window of 64
- pictures, a lost slice can find its way.
-
-
-
- Macroblock:
- motion vectors are now always represented along a half-pel grid. The
- usefulness of an integer-pel grid (option in MPEG-1) diminished with
- practice. A intrinsic half-pel accuracy can encourage use by encoders
- for the significant coding gain which half-pel interpolation offers.
-
- In both MPEG-1 and MPEG-2, the dynamic range of motion vectors is
- specified on a picture basis. A set of pictures corresponding to a
- rapid motion scene may need a motion vector range of up to +/- 64
- integer pixels. A slower moving interval of pictures may need only a
- +/- 16 range. Due to the syntax by which motion vectors are signaled in
- a bitstream, pictures with little motion would suffer unnecessary bit
- overhead in describing motion vectors in a coordinate system
- established for a much wider range. MPEG-1s f_code picture header
- element prescribed a radius shared by horizontal and vertical motion
- vector components alike. It later became practice in industry to have a
- greater horizontal search range (motion vector radius) than vertical,
- since motion tends to be more prominent across the screen than up or
- down (vertical). Secondly, a decoder has a limited frame buffer size
- in which to store both the current picture under decoding and the set
- of pictures (forward, backward) used for prediction (reference) by
- subsequent pictures. A decoder can write over the pixels of the oldest
- reference picture as soon as it no longer is needed by subsequent
- pictures for prediction. A restricted vertical motion vector range
- creates a sliding window, which starts at the top of the reference
- picture and moves down as the macroblocks in the current picture are
- decoded in raster order. The moment a strip of pixels passes outside
- this window, they have ended their life in the MPEG decoding loop. As
- a result of all this, MPEG-2 created separate into horizontal and
- vertical range specifiers (f_code[][0] for horizontal, and f_code[][1]
- for vertical), and placed greater restrictions on the maximum vertical
- range than on the horizontal range. In Main Level frame pictures, this
- is range is [- 128,+127.5] vertically, and [-1024,+1023.5]
- horizontally. In field pictures, the vertical range is restricted to [-
- 64,+63.5].
-
- Macroblock stuffing is now illegal in MPEG-2. The original intent
- behind stuffing in MPEG-1 was to provide a means for finer rate control
- adjustment at the macroblock layer. Since no self-respecting encoder
- would waste bits on such an element (it does not contribute to the
- refinement of the reconstructed video signal), and since this unlimited
- loop of stuffing variable length codes represent a significant headache
- for hardware implementations which have a fixed window of time in which
- to parse and decode a macroblock in a pipeline, the element was
- eliminated in January 1993 from the MPEG-2 syntax. Some feel that
- macroblock stuffing was beneficial since it permitted macroblocks to be
- coded along byte boundaries. A good compromise could have been a
- limited number of stuffs per macroblock. If stuffing is needed for
- purposes of rate control, an encoder can pad extra zero bytes before
- the start code of the next slice. If stuffing is required in the last
- row of macroblocks of the picture, the picture start code of the next
- picture can be padded with an arbitrary number of bytes. If the
- picture happens to be the last in the sequence, the sequence_end_code
- can be stuffed with zero bytes.
-
- The dct_type flag in both Intra and non-Intra coded macroblocks of
- frame structured pictures signals that the reconstructed samples output
- by the IDCT stage shall be organized in field or frame order. This
- flag provides an encoder with a sort of poor mans motion_type by
- adapting to the interparity (i.e. interfield) characteristics of the
- macroblock without signaling a need for motion vectors via the
- macroblock_type variable. dct_type plays an essential role in Intra
- frame pictures by organizing lines of a common parity together when
- there is significant interfield motion within the macroblock. This
- increases the decorrelation efficiency of the DCT stage. For
- non-intra macroblocks, dct_type organizes the 16 lines (... luminance,
- 8 lines chrominance) of the macroblock prediction error. In combination
- with motion_type, the meaning....
-
-
- dct_type
- motion_format
- interpretation
-
- frame
- Intra coded
- block data is frame correlated
-
- field
- Intra coded
- block data is more strongly correlated along lines of
- opposite parity
-
-