home *** CD-ROM | disk | FTP | other *** search
Text File | 1995-07-25 | 47.7 KB | 1,015 lines |
- Subject: MPEG-FAQ: multimedia compression [2/6]
- Newsgroups: comp.graphics,comp.graphics.animation,comp.compression,comp.multimedia,alt.binaries.multimedia,alt.binaries.pictures.utilities,alt.binaries.pictures,alt.binaries.pictures.d,alt.answers,comp.answers,news.answers
- From: phade@cs.tu-berlin.de (Frank Gadegast)
- Date: 22 Aug 1994 12:28:48 GMT
-
- Archive-name: mpeg-faq/part2
- Last-modified: 1994/08/22
- Version: v 3.2 94/08/22
- Posting-Frequency: bimonthly
-
-
- BEGIN -------------------- CUT HERE --------------------- 2/6
- for example, the MPEG-1 style sequence_header() is followed by
- sequence_extension() which is exclusive to MPEG-2. Some extension
- headers are specific to MPEG-2 profiles. For example,
- sequence_scalable_extension() is not allowed in Main Profile.
-
- A simple program need only scan the coded bistream for byte-aligned
- start codes to determine whether the stream is MPEG-1 or MPEG-2.
-
- Q. What is the precision of MPEG samples?
- A. By definition, MPEG samples have no more and no less than 8-bits
- uniform sample precision (256 quantization levels). For luminance
- (which is unsigned) data, black corresponds to level 0, white
- is level 255. However, in CCIR recommendation 601 chromaticy, levels
- 0 through 14 and 236 through 255 are reserved for blanking signal
- excursions. MPEG currently has no such clipped excursion restrictions.
-
- Q. Is it MPEG-2 (arabic numbers) or MPEG-II (roman)?
-
- A. Committee insiders most often use the arabic notation with the
- hyphen, e.g. MPEG-2. Only the most retentive use the official
- designation: Phase 2. In fact, M.P.E.G. itself is a nickname. The
- official name is: ISO/IEC JTC1 SC29 WG11. The militaristic lingo has
- so far managed to keep the enemy (DVI) confused and out of the picture.
-
- ISO: International Organization for Standardization
- IEC: Interntional Electrotechnical Commission
- JTC1: Joint Technical Committee 1
- SC29: Sub-committee 29
- WG11: Work Group 11 (moving pictures with... uh, audio)
-
- Q. Why MPEG-2? Wasn't MPEG-1 enough?
-
- A. MPEG-1 was optimized for CD-ROM or applications at about 1.5 Mbit/sec.
- Video was strictly non-interlaced (i.e. progressive). The international
- co-operation had executed so well for MPEG-1, that the committee began to
- address applications at broadcast TV sample rates using the CCIR 601
- recommendation (720 samples/line by 480 lines per frame by 30 frames per
- second... or about 15.2 million samples/sec including chroma) as the
- reference.
-
- Unfortunately, today's TV scanning pattern is interlaced. This
- introduces a duality in block coding: do local redundancy areas
- (blocks) exist exclusively in a field or a frame...
- (or a particle or wave) ? The answer of course is that some blocks
- are one or the other at different times, depending on motion activity.
-
- The additional man years of experimentation and implementation between
- MPEG-1 and MPEG-2 improved the method of block-based transform coding.
-
- Q. How do MPEG and JPEG differ?
-
- A. The most fundamental difference is MPEG's use of block-based motion
- compensated prediction (MCP)---a general method falling into the
- temporal DPCM category.
-
- The second most fundamental difference is in the target application.
- JPEG adopts a general purpose philosophy: independence from color space
- (up to 255 components per frame) and quantization tables for each
- component. Extended modes in JPEG include two sample precisions (8 and
- 12 bit sample accuracy), combinations of frequency progessive, spatially
- progressive, and amplitude progressive scanning modes. Color independence
- is made possible thanks to downloadable Huffman tables.
-
- Since MPEG is targeted for a set of specific applications, there is
- only one color space (4:2:0 YCbCr), one sample precision (8 bits), and
- one scanning mode (sequential). Luminance and chrominance share
- quantization tables. The range of sampling dimensions are more limited
- as well. MPEG adds adaptive quantization at the macroblock (16 x 16 pixel
- area) layer. This permits both smoother bit rate control
- and more perceptually uniform quantization throughout the picture and
- image sequence. Adaptive quantization is part of the JPEG-2 charter.
- MPEG variable length coding tables are non-downloadable, and are
- therefore optimized for a limited range of compression ratios
- appropriate for the target applications.
-
- The local spatial decorrelation methods in MPEG and JPEG are very similar.
- Picture data is block transform coded with the two-dimensional orthanormal
- 8x8 DCT. The resulting 63 AC transform coefficients are mapped in a
- zig-zag pattern to statistically increase the runs of zeros. Coefficients
- of the vector are then uniformily scalar quantized, run-length coded, and
- finally the run-length symbols are variable length coded using a
- cannonical (JPEG) or modified Huffman (MPEG) scheme. Global frame
- redundancy is reduced by 1-D DPCM of the block DC coefficients, followed
- by quantization and variable length entropy coding.
-
- MCP DCT ZZ Q
- Frame -> 8x8 spatial block -> 8x8 frequency block -> Zig-zag scan ->
-
- RLC VLC
- quanitzation -> run-length coding -> variable length coding.
-
- The similarities have made it possible for the development of hard-wired
- silicon that can code both standards. Even microcoded architectures can
- better optimize through hardwired instruction primitives or functional
- blocks. There are many additional minor differences. They include:
-
- 1. DCT and quantization precision in MPEG is 9-bits since the macroblock
- difference operation expands the 8-bit signal precision by one bit.
-
- 2. Quantization in MPEG-1 forces quantized coefficients to become
- odd values (oddification).
-
- 3. JPEG run-length coding produces run-size tokens (run of zeros,
- non-zero coefficient magnitude) whereas MPEG produces fully
- concatenated run-level tokens that do not require magnitude
- differential bits.
-
- 4. DC values in MPEG-1 are limited to 8-bit precision (a constant
- stepsize of 8), whereas JPEG DC precision can occupy all possible
- 11-bits. MPEG-2, however, re-introduced extra DC precison.
-
-
- Q. What happened to MPEG-3?
-
- A. MPEG-3 was to have targeted HDTV applications with sampling dimensions
- up to 1920 x 1080 x 30 Hz and coded bitrates between 20 and 40 Mbit/sec.
- It was later discovered that with some (compatible) fine tuning, MPEG-2
- and MPEG-1 syntax worked very well for HDTV rate video. The key is
- to maintain an optimal balance between sample rate and coded bit rate.
-
- Also, the standardization window for HDTV was rapidly closing. Europe
- and the United States were on the brink of committing to analog-digital
- subnyquist hybrid algorithms (D-MAC, MUSE, et al). European all-digital
- projects such as HD-DIVINE and VADIS demonstrated better picture quality
- with respect to bandwidth using the MPEG syntax. In the United States, the
- Sarnoff/NBC/Philips/Thomson HDTV consortium had used MPEG-1 syntax from
- the beginning, and with the exception of motion artificats (due to
- limited search range in the encoder), was deemed to have the best picture
- quality of all three digital proponents.
-
- HDTV is now part of the MPEG-2 High-1440 Level and High Level toolkit.
-
- Q. What is MPEG-4?
- A. MPEG-4 targets the Very Low Bitrate applications defined loosly
- as having sampling dimensions up to 176 x 144 x 10 Hz and coded
- bit rates between 4800 and 64,000 bits/sec. This new standard would
- be used, for example, in low bit rate videophones over analog
- telephone lines.
-
- This effort is in the very early stages. Morphology, fractals, model
- based, and anal retentive block transform coding are all in the offering.
- MPEG-4 is now in the application identification phase.
-
- Q. Where can I get a copy of the latest MPEG-2 draft?
- A. Contact your national standards body (e.g. ANSI Sales in NYC for the U.S.)
-
- Q. What is the latest working drafts of MPEG-2 ?
- A. The latest versions of video (version 4), and systems were produced at
- the Brusells meeting (September 10, 1993). The latest audio working
- draft was produced in New York (July 1993).
-
- MPEG-2 Video, Audio, and Systems will reach CD at the November 1994
- Seoul, Korea meeting.
-
- Q. What is the latest version of the MPEG-1 documents?
- A. Systems (ISO/IEC IS 11172-1), Video (ISO/IEC IS 11172-2), and Audio
- (ISO/IEC IS 11172-3) have reached the final document stage. Part 4,
- Conformance Testing, is currently a CD.
-
- Q. What is the evolution of standard documents?
- A. In chronological order:
-
- New Proposal (NP)
- Working Draft (WD)
- Committee Draft (CD)
- Draft International Standard (DIS)
- International Standard (IS)
-
-
- Q. When will an MPEG-2 decoder chip be available?
- A. Several chips will be sampling in late 1993. For reasons of economy
- and scale in the cable TV application, all are single-chip (not including
- DRAM and host CPU/controller) implementations.
- They are:
-
- SGS-Thomson STi-3500
- first MPEG-2 chip on market
- multi-tap binary horizontal sample rate convertor.
- pan & scanning support for 16:9
- requires external, dedicated microcontroller (8 bit)
- 8-bit data bus, no serial data bus.
-
- LSI Logic L64112 successor (pin compatible)
- serial bus, 15 Mbit coded throughput.
- smaller pin-count version due soon.
-
- C-Cube CL-950 successor (?)
-
- In 1994, we can look forward to:
-
- Pioneer single-chip MPEG-2 successor to CD-1100 MPEG-1 chip set.
- IBM single-chip decoder.
-
- Q. Are there single chip MPEG encoders?
-
- A. Yes, the C-Cube CL-4000 is the only single-chip, real-time encoder
- that can process true MPEG-1 SIF rate video.
-
- Single chip for +/- 15 pel motion estimation at SIF rates (352x240x30 Hz)
- Two chips for +/- 32 pel at SIF rates (hierarchical)
- 5 or 6 chips for MPEG-2 at CCIR 601 rates (704 x 480 x 30 Hz)
- Highly microcoded architecture.
- Can code both H.261 and JPEG.
- Implements high picture quality microcode programs.
- [more details from CICC'93 and HotChips '93 conference to be included]
-
- IBM and SGS-Thomson plan to introduce more hard-wired, multichip
- solutions in 1994.
-
- Q. What about MPEG-1 decoder chips?
-
- A. By implication of MPEG-2 Conformace requirements, all MPEG-2 decoders are
- required to decode MPEG-1 bitstreams as well. These chips, however, are
- strictly MPEG-1:
-
-
- C-Cube CL-450 SIF rates. Single-chip. Has on-board CPU.
-
- SGS-Thomson 3400 SIF rates. Single-chip. Hardwired.
-
- Motorola MCD250 SIF rates. Single-chip.
-
- LSI 641172 CCIR 601 rates. Single-chip. Systems
- packet decoder on-chip.
-
- Q. What about audio chips?
- A. To date, only Layer I and Layer II have been implemented in dedicated
- (ASIC) silicon:
-
- Motorola MCD260
-
- Texas Instruments TI 320AV110
- hardwired with systems parsing)
- operates in free format (arbitrary sample rate)
- 120 pin PQFP package
- Serial data port
- Part of technology exchange with C-Cube
-
- LSI Logic L64111
- hardwired w/CPU with on-chip systems parsing.
- Serial data port
- 100-pin PQFP
-
- GCA/ASCII ?
-
- Crystal Semiconductor CS4920
- on-chip, 2 channel 16-bit digital-to-analog convertor (DAC)
- 16 MIPS, 24-bit DSP
- programmable clock manager
- 44-pin PLCC package
- Programmable architecture. For example, can download Layer II
- MPEG-1 audio or Dolby AC-2
- $38 each in large quantities
-
-
- Dolby AC-3
- MPEG NY disclosure
- claimed to be less computationally intensive
- Zoran, GI working on own DSP-like dedicated chips.
-
- Q. Will there be an MPEG video tape format?
-
- A. There is a consortium of companies (Philips, JVC, Sony, Matushista,
- et al) developing a metal particle based 6 milimeter consumer digital
- video tape format. It will initially use more JPEG-like independent
- frame compression for cheap encoding of source analog (NTSC, PAL)
- video. The consequence of course is less efficient use of bandwidth (
- 25 Mbit/sec for the same quality acheived at 6 Mbit/sec with MPEG).
- Pre-compressed video from broadcast sources will be directly recorded
- to tape and "passed-through" as a coded bitstream to the video
- decompression "box" upon playback.
-
-
-
- Q. What do B-frames buy you?
- A. Since bi-directional marcoblock predictions are an average of two maroblocks blocks,
- noise is reduced at low bit rates. At nominal MPEG-1 video (352 x 240 x 30, 1.15
- Mbit/sec) rates, it is said that B-frames improves SNR by as much as 2 dB.
- (0.5 dB gain is usually considered worth-while in MPEG). However, at higher
- bit rates, B-frames become less useful since they inherently do not contribute
- to the progressive refinement of an image sequence (i.e.not used as
- prediction by subsequent coded frames). Regardless, B-frames are still
- politically controversial.
-
-
- Q. Why do some people hate B-frames?
- A. Computational complexity, bandwidth, delay, and picture buffer size are
- the four B-frame Pet Peeves. Computational complexity is increased since
- a some macroblock modes require averaging between two macroblocks.
- Worst case, memory bandwidth is increased an extra 16 MByte/s (601
- rate) for this extra prediction. An extra picture buffer is needed to
- store the future prediction reference (bi-directionality). Finally,
- extra delay is introduced in encoding since the frame used for backwards
- prediction needs to be transmitted to the decoder before the intermediate
- B-pictures can be decoded and displayed.
-
- Cable television (e.g. General Instruments) have been particularly
- adverse to B-frames since the extra picture buffer pushes the decoder
- DRAM memory requirements past the magic 8-Mbit (1 Mbyte) threshold into the
- realm of 16 Mbits (2 MByte) for CCIR 601 frames (704 x 480), yet not for
- lowly 352 x 480. However, cable does not realize that DRAM does not come
- in convenient high-volume (low cost) 8-Mbit packages as 16-Mbit does. In
- a few years, the cost differences between 16 Mbit and 8 Mbit will become
- insignificant compared to the gain in compression. For the time being,
- cable boxes will start with 8-Mbit and allow future drop-in upgrades to
- 16-Mbit. The early market success of B-frames seem to have been
- determined by a fire at a Japanese chemical plant.
-
- Q. How do MPEG and H.261 differ?
- A. H.261 was targeted for teleconferencing applications where motion
- is naturally more limited. Motion vectors are restricted to a range of
- +/- 15 pixels. Accuracy is reduced since H.261 motion vectors are
- restricted to integer-pel accuracy. Other syntactic differences
- include: no B-pictures, different quantization method.
-
- H.261 is also known as P*64. "P" is an integer number meant to
- represent multiples of 64kbit/sec. In the end, this nomenclature
- probably won't be used as many services other than video will adopt the
- philosophy of arbitrary B channel (64kbit) bitrate scalability.
-
- Q. Is H.261 the de facto teleconferencing standard?
-
- A. Not exactly. To date, about seventy percent of the industrial
- teleconferencing hardware market is controlled by PictureTel of Mass.
- The second largest market controller is Compression Labs of Silicon
- Valley. PictureTel hardware includes compatibility with H.261 as a
- lowest common denominator, but when in comminication with other
- PictureTel hardware, it can switch to a mode superior at low bit rates
- (less than 300kbits/sec). In fact, over 2/3 of all teleconfercing is done
- at two-times switched 56 channel (~P = 2) bandwidth. Long distance ISDN
- ain't cheap. In each direction, video and audio are coded at an
- aggregate of 112 kbits/sec (2*56 kbits/sec).
-
- The PictureTel proprietary compression algorithm is acknowledged to
- be a combination of spatial pyramid, lattice vector quanitzer, and an
- unidentified entropy coding method. Motion compensation is considerably
- more refined and sophisticated than the 16x16 integer-pel block method
- specified in H.261.
-
- The Compression Labs proprietary algorithm also offers significant
- improvement over H.261 when linked to other CLI hardware.
-
- Currently, ITU-TS (International Telecommunications Union--Teleconferencing
- Sector), formerly CCITT, is quietly defining an improvement to H.261 with
- the participation of industry vendors.
-
- Q. Where will be see MPEG in everyday life?
- A. Just about wherever you see video today.
-
- DBS (Direct Broadcast Satellite)
- The Hughes/USSB DBS service will use MPEG-2 video and audio. Thomson
- has exclusive rights to manufacture the decoding boxes for the first
- 18 months of operation. No doubt Thomson's STi-3500 MPEG-2 video
- decoder chip will be featured.
-
- Hughes/USSB DBS will begin service in North America in April 1994.
- Two satellites at 101 degrees West will share the power requirements
- of 120 Watts per 27 MHz transponder. Multi-source channel rate
- control methods will be employed to optimally allocate bits between
- several programs on one data carrier. An average of 150 channels are
- planned.
-
-
- CATV (Cable Television)
- Despite conflicting options, the the cable industry has more or less
- settled on MPEG-2 video. Audio is less than settled. For example,
- General Instruments (the largest U.S. consumer cable set-top box
- manufacturer) have announced the planned use of the Dolby AC-3
- audio algorithm.
-
- The General Instruments DigiCipher I video syntax is similar to MPEG-2
- syntax but uses smaller macroblock predictions and no B-frames. The
- DigiCipher II specification will include modes to support both the GI
- and full MPEG-2 Video Main Profile syntax. Services such as HBO will
- upgrade to DigiCipher II in 1994.
-
- HDTV
- The U.S. Grand Alliance, a consortium of companies that formely competed
- for the U.S. terrestrial HDTV standard, have already agreed to use
- the MPEG-2 Video and Systems syntax---including B-pictures. Both interlaced
- (1440 x 960 x 30 Hz) and progressive (1280 x 720 x 60 Hz) modes will
- be supported. The Alliance must then settle upon a modulation (QAM,
- VSB, OFDM), convolution (MS or Viterbi), and error correction (RSPC, RSFC)
- specification.
-
- In September 1993, the consortium of 85 European companies signed an
- agreement to fund a project known Digital Video Broacasting (DVB) which
- will develop a standard for cable and terrestrial transmission by the
- end of 1994. The scheme will use MPEG-2. This consortium has put the
- final nail in the coffin of the D-MAC scheme for gradual migration
- towards an all-digital, HDTV consumer transmission standard. The only
- remaining analog or digital-analog hybrid system left in the world is
- NHK's MUSE (which will probably be axed in a few years).
-
- Q. What did MPEG-2 add to MPEG-1 in terms of syntax/algorithms ?
- A. Here is a brief summary:
-
- Sequence layer:
- More aspect ratios. A minor, yet neccessary part of the syntax.
-
- Horizontal and vertical dimensions are now required to be a multiple of
- 16 in frame coded pictures, and the vertical dimension must be a multiple
- of 32 in field coded pictures.
-
- 4:2:2 and 4:4:4 macroblocks were added in the Next profiles.
-
- Syntax can now signal frame sizes as large as 16383 x 16383.
-
- Syntax signals source video type (NTSC, PAL, SECAM, MAC, component) to
- help post-processing and display.
-
- Source video color primaries (609, 170M, 240M, D65, etc.) and opto-
- electronic transfer characteristics (709, 624-4M, 170M etc.) can be
- indicated.
-
- Four scalable modes [see scalable section below]
-
- Picture layer:
- All MPEG-2 motion vectors are half-pel accuracy.
-
- DC precision can be user-selected as 8, 9, 10, or 11 bits.
-
- Concealment motion vectors were added to I-pictures in order to
- increase robustness from bit errors since I pictures are the most
- critical and sensitive in a group of pictures.
-
- A non-linear macroblock quantization factor that results in a more
- dynamic step size range, from 0.5 to 56, than in MPEG-1 (1 to 32).
-
- New Intra-VLC table for dct_next_coefficient (AC run-level events)
- that is more geared towards I-frame probability distribution. EOB
- is 4 bits. The old tables are still included.
-
- Alternate scanning pattern that (supposedly) improves entropy coding
- performance over the original Zig-Zag scan used in H.261, JPEG, and
- MPEG-1. The extra scanning pattern is geared towards interlaced
- video.
-
- Syntax to signal 3:2 pulldown process (repeat_field_first flag)
-
- Syntax flag to signal chrominance post processing type (4:2:0 to
- 4:2:2 upsampling conversion)
-
- Progressive and interlaced frame coding
-
- Syntax to signal source composite video characteristics useful in
- post-processing operations. (v-axis, field sequence, sub_carrier,
- phase, burst_amplitude, etc.)
-
- Pan & scanning syntax that tells decoder how to, for example, window a
- 4:3 image within a wider 16:9 aspect ratio image. Vertical pan offset
- has 1/16th pixel accuracy.
-
- Macroblock layer:
- Macroblock stuffing is now illegal in MPEG-2 (hurray!!)
-
- Two line modes (interlaced and progressive) for DCT operation.
-
- Now only one run-level escape code code (24-bits) instead of
- the single (20-bits) and double escape (28-bits) in MPEG-1.
-
- Improved mismatch control in quantization over the original oddification
- method in MPEG-1. Now specifies adding or subtracting one to the
- 63rd AC coefficient depending on parity of summed quantized coefficients.
-
- Many additional prediction modes (16x8 MC, field MC, Dual Prime)
- and, correspondingly, macroblock modes.
-
- Overall, MPEG-2's greatest compression improvements over MPEG-1 are:
- prediction modes, Intra VLC table, DC precision, non-linear macroblock
- quant. Implementation improvements, well,.. uh... macroblock stuffing
- was eliminated.
-
- Q. What are the scalable modes of MPEG-2?
- A. Scalable video is permitted only in the Main+ and Next profiles.
- Currently, there are four scalable modes in the MPEG-2 toolkit.
- These modes break MPEG-2 video into different layers (base, middle,
- and high layers) mostly for purposes of prioritizing video data. For
- example, the high priority channel (bitstream) can be coded with a
- combination of extra error correction information and decreased bit
- error (i.e. higher Carrier-to-Noise ratio or signal strength) than
- the lower priority channel.
-
- Another purpose of scalablity is complexity division. For example,
- in HDTV, the high priority bitstream (720 x 480) can be decoded
- under noise conditions were the lower priority (1440 x 960) cannot.
- This is "graceful" degradation. By the same division however, a
- standard TV set need only decode the 720 x 480 channel, thus requiring
- a less expensive decoder than a TV set wishing to display 1440 x 960.
- This is simulcasting.
-
- A brief summary of the MPEG-2 video scalability modes:
- [better descriptions in installment 3]
-
- Spatial Scalablity-- Useful in simulcasting, and for feasible software
- decoding of the lower resoultion, base layer. This spatial domain
- method codes a base layer at lower sampling dimensions (i.e. "resolution")
- than the upper layers. The upsampled reconstructed lower (base) layers
- are then used as prediction for the higher layers.
-
- Data Partitioning-- Similar to JPEG's frequency progressive mode, only
- the slice layer indicates the maximum number of block transform
- coefficients contained in the particular bitstream (known as the
- "priority break point"). Data partitioning is a frequency domain method
- that breaks the block of 64 quantized transform coefficients into two
- bitstreams. The first, higher priority bitstream contains the more
- critical lower frequency coefficients and side informations (such as DC
- values, motion vectors). The second, lower priority bitstream carries
- higher frequency AC data.
-
- SNR Scalability-- Similar to the point transform in JPEG, SNR scalability
- is a spatial domain method where channels are coded at identical sample
- rates, but with differing picture quality (through quantization step sizes).
- The higher priority bitstream contains base layer data that can be added
- to a lower priority refinement layer to construct a higher quality picture.
-
- Temporal Scalability--- A temporal domain method useful in, e.g.,
- stereoscopic video. The first, higher priority bitstreams codes video
- at a lower frame rate, and the intermediate frames can be coded in a
- second bitstream using the first bitstream reconstruction as prediction.
- In sterescopic vision, for example, the left video channel can be
- prediction from the right channel.
-
- Other scalability modes were experimented with in MPEG-2 video (such as
- Frequency Scalability), but were eventually dropped in favor of methods
- that demonstrated similar quality and greater simplicity.
-
- Q. What is all the fuss with cositing of chroma components?
- A. It is important to properly co-site chroma samples, otherwise chroma
- shifting may result.
- [insert more details in installment 3]
-
- Q. What is the reasoning behind MPEG syntax symbols?
- A. Here are some of the Whys and Wherefores of MPEG symbols:
-
- Start codes
- These 32-bit byte-aligned codes provide a mechanism for cheaply searching
- coded bitstreams for commencment of various layers of video without having
- to actually parse or decode. Start codes also provide a mechanism for
- resynchronization in the presense of bit errors.
-
- Coded block pattern (CBP --not to be confused with Constrained Parameters!)
- When the frame prediction is particularly good, the displaced
- frame differencene (DFD, or prediction error) tends to be small, often
- with entire block energy being reduced to zero after quantization. This
- usually happens only at low bit rates. Coded block patterns prevent
- the need for transmitting EOB symbols in those zero coded blocks.
-
- DCT_coefficient_first
- Each intra coded block has a DC coefficient. Inter coded blocks
- (prediction error or DFD) naturally do not since the prediction error
- is the first derivative of the video signal. With coded block patterns
- signalling all possible non-coded block patterns, the dct_coef_first
- mechanism assigns a different meaning to the VLC codeword that would
- otherwise represent EOB as the first coefficient.
-
- End of Block
- Saves unecessary run-length codes. At optimal bitrates, there tends to be
- few AC coefficients concentrated in the early stages of the zig-zag vector.
- In MPEG-1, the 2-bit length of EOB implies that there is an average of only
- 3 or 4 non-zero AC coefficients per block. In MPEG-2 Intra (I) pictures,
- with a 4-bit EOB code, this number is between 9 and 16 coefficients.
- Since EOB is required for all coded blocks, its absense can signal that a
- syntax error has occurred in the bitstream.
-
- Macroblock stuffing
- A genuine pain for VLSI implementations, macroblock stuffing was introduced
- to maintain smoother, constant bitrate control in MPEG-1. However, with
- normalized complexity measures and buffer management performed on a
- a priori (pre-frame, pre-slice, and pre-macroblock) basis in the MPEG-2
- encoder test model, the need for such localized smoothing evaportated.
- Stuffing can be acheived through virtually unlimited slice start code
- padding if required. A good rule of thumb: if you find yourself often
- using stuffing more than once per slice, you probably don't have a very
- good rate control algorithm. Anyway, marcoblock stuffing is now illegal in
- MPEG-2.
-
-
- MPEG's modified Huffman VLC tables
- The VLC tables in MPEG are not Huffman tables in the true sense of
- Huffman coding, but are more like the tables used in Group 3 fax.
- They are entropy constrained, that is, non-downloadable and optimized
- for a limited range of bit rates (sweet spots). With the acception of
- a few codewords, the larger tables were carried over from the H.261
- standard of 1990. MPEG-2 added an "Intra table". Note that the
- dct_coefficient tables assume positive/negative coefficient pmf symmetry.
-
-
- Q. What is the TM rate control and adaptive quantization technique ?
- A. Test model was not by any strech of the imagination meant to
- be the show-stopping, best set of algorithm. It was designed to
- excersize the syntax, verify proposals, and test the *relative*
- performance of proposals in a way that could be duplicated
- by co-experimentors in a timely fashion. Otherwise there would
- be more endless debates about model interpretation than actual
- time spent in verification.
-
- [MPEG-2 Test model is frozen as v5b]
-
- The MPEG-2 Test Model (TM) rate control method offers a dramatic
- improvement to the Simulation Model (SM) method used for MPEG-1. TM's
- improvements are due to more sophistication pre-analysis and post-analysis
- routines.
-
- Rate control and adaptive quantization are divided into three steps:
-
- Step One: Bit Allocation
-
- In Complexity Estimation, the global complexity measures assign relative
- weights to each picture type. These weights (Xi, Xp, Xb) are reflected
- by the typical coded frame size of I, P, and B pictures (see typical frame
- size section). I pictures are assigned the largest weight since they have
- the greatst stability factor in an image sequence. B pictures are assigned
- the smallest weight since B data does not propogate into other frames
- through the prediction process.
-
- Picture Target Setting allocates target bits for a frame based on
- the frame type and the remaining number of frames of that same
- type in the Group of Pictures (GOP).
-
-
- Step Two: Rate Control
-
- Rate control attempts to adjust bit allocation if there is
- significant difference between the target bits (anticipated
- bits) and actual coded bits for a block of data.
-
- [more detail in installment 3]
-
- Step Three: Adaptive Quantization
-
- Recomputes macroblock quantization factor according to
- activity of block against the normalized activity of the
- frame.
-
- The effect of this step is to roughly assign a constant number
- of bits per macroblock (this results in more perceptually uniform
- picture quality).
-
- [more detail in installment 3]
-
-
- Q. How would you explain MPEG to the data compression expert?
- A. MPEG video is a block-based video scheme
- Local decorrelations via DCT-Q-VLC hybrid
- Dead-zone quanitizer
- DFD: quantized prediction error
- [etc. More in installment 3]
-
- Q. What are the implementation requirements?
- A. MPEG pushes the limit of economical VLSI technology (but you get
- what you pay for in terms of picture quality or compaction efficiency)
-
- Video Typical decoder Total DRAM bus width
- Profile transistor count DRAM @ speed
- ------------ ---------------- ------- -------------------
- MPEG-1 CPB 0.4 to .75 million 4 Mbit 16 bits @ 80 ns
- MPEG-1 601 0.8 to 1.1 million 16 Mbit 64 bits @ 80 ns
- MPEG-2 MP@ML 0.9 to 1.5 million 16 Mbit 64 bits @ 80 ns
- MPEG-2 MP@High1440 2 to 3 million 64 Mbit N/A
-
- 70 or 80ns DRAM speed is a measure of the shortest period in which
- words can be transfered across the bus. In the case of MPEG-1 SIF,
- 80ns implies (1/80ns)(16bits) or about 25 MBytes/sec of bandwidth.
- Lack of cheap memory (DRAM) utilization is where the original DVI
- algorithm made a costly mistake. DVI required expensive VRAM/SRAM
- chips (a static RAM transistor requires 6 transistors compared to
- 1 transistor for DRAM). Fast page mode DRAM (which has slower
- throughput than SRAM and requires near-contiguous address mapping)
- is viable for MPEG due almost exclusively to the block nature of
- the algorithm and syntax (DRAM memory locations are broken into
- rows and columns).
-
- Q. Is exhuastive search "optimal" ?
- A. Definately not in the context of block-based MCP. Since one motion
- vector represents the prediction of 256 pixels, divergent pixels within
- the macroblock are misrepresented by the "global" vector. This leads
- back to the general philosophy of block-based coding as an approximation
- technique. Exhuastive search may find blocks with the least distortion
- (displaced frame difference) but will not produce motion vectors with
- the least entropy. [more details later]
-
- Q. What is a good motion estimation method, then?
- When shopping for motion vectors, the three basic characteristics are:
- Search range, search pattern, and matching criteria. Search pattern
- has the greatest impact on finding the best vector. Hierarchical
- search patterns first find the best match between downsampled images of
- the reference and target pictures and then refine the vector through
- progressively higher resolutions. Hierarchical patterns are less
- likely to be confused by extremely local distortion minimums as being
- a best match.
-
- [Accuracy vs. Ambiguity]
-
- [Some ways of solving problem (Gary Sullivan--ICASSP '93), but not
- syntacitally compatible].
-
- [motion vector pre-frame search, motion vector refinement, etc.
- in installment 3]
-
- Q. What is MPEG 1.5 and MPEG++ ?
- A. MPEG-1.5 was not exactly a proprietary twist in terms of syntax,
- but operating parameters. Again, people (erronously) consider MPEG-1
- to be limited to SIF rates (352 x 240 x 30 Hz). After interrogation,
- most MPEG 1.5 proponents will confess that MPEG 1.5 is simply MPEG-1 at
- CCIR 601 rates (704 x 480 x 30 Hz) and that it may or may not include
- B-frames. It was meant to be an interrum solution for cable TV until
- MPEG-2 chips became available.
-
- MPEG++ is/was proprietary only at the transport layer (compatible syntax
- at the video layer). This name was coined by the Sarnoff/Philips/
- RCA/Thomson HDTV consortium.
-
- Both MPEG 1.5 and MPEG++ are now moot since MPEG-2 Simple profile and
- MPEG-2 Systems layer fill these potentials, respectively.
-
-
- Q. What about MPEG-2 audio?
- A. MPEG-2 audio attempts to maintain as much compatibility with
- MPEG-1 audio syntax as possible, while adding discrete surround-sound
- channels to the orignal MPEG-1 limit of 2 channels (Left, Right or
- matrix center and difference). The main channels (Left, Right) in
- MPEG-2 audio will remain backwards compatible, whereas new coding
- methods and syntax will be used for the surround channels.
-
- A total of 5.1 channels are included that consist of the two main
- channels (L,R), two side/rear, center, and a 100 Hz special effects
- channel (hence the ".1" in "5.1").
-
- At this time, non-backwards compatible (NBC) schemes are being
- considered as an ammedment to the MPEG-2 audio standard. One
- such popular system is Dolby AC-3.
-
- [installment 3: detail on Layers, AC-3, etc., optimal bitrates.]
-
- Q. What about MPEG-2 systems?
- A. [to be filled out in installment 3]
- Transport stream
- Program stream
- ATM
- PES
- Timing Recovery
-
- Q. How many bitstreams can MPEG-2 systems represent?
- A. [installment 3]
-
-
- Q. What are the typical MPEG-2 bitrates and picture quality?
- [examples of typical frame sizes in bits]
-
- Picture type
- I P B Average
- MPEG-1 SIF
- @ 1.15 Mbit/sec 150,000 50,000 20,000 38,000
-
- MPEG-2 601 400,000 200,000 80,000 130,000
- @ 4.00 Mbit/sec
-
- Note: parameters assume Test Model for encoding, I frame distance of 15
- (N = 15), and a P frame distance of 3 (M = 3).
-
- Of course with scene changes and more advanced encoder models found
- in any real-world implementation, these numbers can be very different.
-
- Q. At what bitrates is MPEG-2 video optimal?
- A. The Test subgroup has defined a few examples:
-
- "Sweet spot" sampling dimensions and bit rates for MPEG-2:
-
- Dimensions Coded rate Comments
- ------------- ---------- -------------------------------------------
- 352x480x24 Hz 2 Mbit/sec Half horizontal 601. Looks almost NTSC
- (progressive) broadcast quality, and is a good (better)
- substitute for VHS. Intended for film src.
-
- 544x480x30 Hz 4 Mbit/sec PAL broadcast quality (nearly full capture
- (interlaced) of 5.4 MHz luminance carrier). Also
- 4:3 image dimensions windowed within 720
- sample/line 16:9 aspect ratio via pan&scan.
-
- 704x480x30 Hz 6 Mbit/sec Full CCIR 601 sampling dimensions.
- (interlaced)
-
- [these numbers subject to change at whim of MPEG Test subgroup]
-
-
- Q. How does MPEG video really compare to TV, VHS, laserdisc ?
- A. VHS picture quality can be acheived for source film video at about
- 1 million bits per second (with proprietary encoding methods). It is
- very difficult to objectively compare MPEG to VHS. The response curve
- of VHS places -3 dB at around 2 MHz of analog luminance bandwidth
- (equivalent to 200 samples/line). VHS chroma is considerably less dense
- in the horizontal direction than MPEG source video (compare 80 samples/
- line to 176!). From a sampling density perspective, VHS is superior only
- in the vertical direction (480 lines compared to 240)... but when taking
- into account interfield magnetic tape crosstalk and the TV monitor Kell
- factor, not by all that much. VHS is prone to timing errors (which can be
- improved with time base correctors), whereas digital video is fully
- discretized. Pre-recorded VHS is typically recorded at very high
- duplication speeds (5 to 15 times real time playback), which leads to
- further shortfalls for the format that has been with us since 1977.
-
- Broadcast NTSC quality can be approximated at about 3 Mbit/sec, and PAL
- quality at about 4 Mbit/sec. Of course, sports sequences with complex
- spatial-temporal activity need more like 5 and 6 Mbit/sec, respectively.
-
- Laserdisc is a tough one to compare. Disc is composite video (NTSC
- or PAL) with up to 425 TVL (or 567 samples/line) response. Thus it
- could be said laserdisc has 567 x 480 x 30 Hz "resolution". The
- carrier-to-noise ratio is typically better than 48 dB. Timing is
- excellent. Yet some of the clean characteristics of laserdisc can be
- acheived at 1.15 Mbit/sec (SIF rates), especially for those areas of
- medium detail (low spatial activity) in the presense of uniform motion.
- This is why some people say MPEG-1 video at 1.15 Mbit/sec looks almost
- as good as Laserdisc or Super VHS.
-
- Regardless of the above figures, those clever proprietary encoding
- algorithms can push these bitrates even lower.
-
-
- Q. Why film does so well with MPEG ?
- A. Several reasons, really:
-
- 1) The frame rate is 24 Hz (instead of 30 Hz) which is a savings of
- some 20%.
- 2) the film source video is inherently progressive. Hence no fussy
- interlaced spectral frequencies.
- 3) the pre-digital source was severly oversampled (compare 352 x 240
- SIF to 35 milimeter film at, say, 3000 x 2000 samples). This can
- result in a very high quality signal, whereas most video cameras do
- not oversample, especially in the vertical direction.
- 4) Finally, the spatial and temporal modulation transfer function (MTF)
- characteristics (motion blur, etc) of film are more ameniable to
- the transform and quantization methods of MPEG.
-
- Q. What is the best compression ratio for MPEG ?
- A. The MPEG sweet spot is about 1.2 bits/pel Intra and .35 bits/pel inter.
- Experimentation has shown that intra frame coding with the familiar
- DCT-Quantization-Entropy hybrid algorithm acheives optimal performance
- at about an average of 1.2 bits/sample or about 6:1 compression ratio.
- Below this point, artifacts become noticable.
-
-
- Q. What are some pre-processing enhancements ?
-
- Adaptive de-interlacing:
- This method maps interlaced video from a higher sampling rate (e.g
- 720 x 480) into a lower rate, progressive format (352 x 240). The
- most basic algorithm measures the variance between two fields, and if
- the variance is small enough, uses an average of both fields to form a
- frame macroblock. Otherwise, a field area from one field (of the same
- parity) is selected. More clever algorithms are much more complex
- than this, and may involve median filtering, and multirate/
- multidimensional tools.
-
- Pre-anti-aliasing and Pre-blockiness reduction:
- A common method in still image coding is to pre-smooth the image
- before compression encoding. For example, if pre-analysis of a
- frame indicates that serious artifacts will arise if the picture
- were to be coded in the current condition, a pre-anti-aliasing
- filter can be applied. This can be as simple as having a smoothing
- severity proportional to the image activity. The pre-filter can be
- global (same smoothing factor for whole image) or locally adaptive.
- More complex methods will use multirate/multidimensional tools again.
-
- The basic idea of multidimensional/multirate pre-processing is to
- apply source video whose resolution (sampling density) is greater
- than the target source and reconstruction sample rates. This follows
- the basic principles of oversampling, as found in A/D converters.
-
- Most detail is contained in the lower harmonics anyway. Sharp-cut off
- filters are not widely practiced, so the "320 x 480 potential" of VHS
- is never truly realized.
-
- Q. Why use these "advanced" pre-filtering techniques?
-
- A. Think of the DCT and quantizer as an A/D convertor. Think of the
- pre-filter as the required anti-alias prefilter found before every
- A/D. The big difference of course is that the DCT quantizer assigns
- a varying number of bits per sample (transform coefficient).
-
- Judging on the normalized activity measured in the pre-analysis
- stage of video encoding, and the target buffer size status, you have
- a fairly good idea of how many bits can be spared for the target
- macroblock, for instance.
-
- Other pre-filtering techniques mostly take into account: texture
- patterns, masking, edges, and motion activity. Many additional
- advanced techniques can be applied at different immediate layers
- of video encoding (picture, slice, macroblock, block, etc.).
-
-
- Q. What are some advanced encoding methods?
-
- Quantizer feedback
- [Thomson patent: installment 3]
-
- Horizontal variance [installment 3]
-
- motion vector cost: this is true for any syntax elements, really.
- Signalling a macroblock quantization factor or a large motion vector
- differential can cost more than making up the difference with extra
- quantized DFD (prediction error) bits. The optimum can be found
- with, for example, a Lagrangian process. In summary, any compression
- system with side information, there is a optimum point between signalling
- overhead (e.g. prediction) and prediction error.
-
- Liberal Interpretations of the Forward DCT
- Borrowing from the concept that the DCT is simply a filter bank, a
- technique that seems to be gaining popularity is basis vector shaping.
- Usually this is combined with the quantization stage since the two are
- tied closely together in a rate-distortion sense. The idea is to use
- the basis vector shaping as a cheap alternative to pre-filtering by
- combining the more diserable data adaptive properties of pre-filtering/
- pre-processing into the transformation process... yet still reconstruct
- a picture in the decoder using the standard IDCT that looks reasonably
- like the source. Some more clever schemes will apply windowing.
- [Warning: watch out for eigenimage/basis vector orthoganality. ]
-
- Frequency-domain enhancements:
- Enhancements are applied after the DCT (and possibly quantization)
- stage to the transform coefficients. This borrows from the concept:
- if you don't like the (quantized) transformed results, simply reshape
- them into something you do like.
-
- Temporal spreading of quantization error:
- This method is similar to the orignal intent behind color subcarrier
- phase alternation by field in the NTSC analog TV standard: for stationary
- areas, noise does not hang" in one location, but dances about the image
- over time to give a more uniform effect. Distribution makes it more
- difficult for the eye to "catch on" to trouble spots (due to the latent
- temporal response curve of human vision). Simple encoder models tend
- to do this naturally but will not solve all situations.
-
-
- Look-ahead and adaptive frame cycle structures:
- Scene changes
- [installment 3]
-
- It is easy to spot encoders that do not employ any advanced
- encoding techniques: reconstruced video usally contains
- ringing around edges, color bleeding, and lots of noise.
-
-
- Post-processing
-
- (non-linear) Interpolation methods (Wu-Gersho)
- Convex hull projections
- Some ICASSP '93 papers, etc.
-
- Conformance vs. post-processing: Post-processing makes judging
- decoder output for conformace testing near impossible.
- [installment 3]
-
- Q. Why bother to research compressed video when there is a standard?
- A. Despite the worldwide standard, many areas remain open for
- research: advanced encoding and pre-processing, motion estimation,
- macroblock decision models, rate control and buffer management, etc.
- There's practically no end to it.
-
-
- Q. Is so-and-so really MPEG compliant ?
-
- A. At the very least, there are two areas of conformance/compliance in
- MPEG: 1. Compliant bitstreams 2. compliant decoders. Technically
- speaking, video bitstreams consisting entirely of I-frames (such as
- those generated by Xing software) are syntactically compliant with
- the MPEG specification. The I-frame sequence is simply a subset of
- the full syntax. Compliant bitstreams must obey the range limits
- (e.g. motion vectors limited to +/-128, frame sizes, frame rates, etc.)
- and syntax rules (e.g. all slices must commence and terminate with a
- non-skipped macroblock, no gaps between slices, etc.).
-
- Decoders, however, cannot escape true comformance. For example, a
- decoder that cannot decode P or B frames are *not* legal MPEG.
- Likewise, full arithmetic precision must be obeyed before any
- decoder can be called "MPEG compliant." The IDCT, inverse quantizer,
- and motion compensated predictior must meet the specification
- requirements... which are fairly rigid (e.g. no more than 1 least
- significant bit of error between reference and test decoders).
- Real-time conformance is more complicated to measure than arithmetic
- precision, but it is reasonable to expect that decoders that skip
- frames on reasonable bitstreams are not likely to be considered
- compliant.
-
-
-
- Q. What are some journals on related MPEG topics ?
- A.
-
- IEEE Multimedia [first edition Spring 1994]
- IEEE Transactions on Consumer Electronics
- IEEE Transactions on Broadcasting
- IEEE Transactions on Circuits and Systems for Video Technology
- Advanced Electronic Imaging
- Electronic Engineering Times (EE Times)
- IEEE Int'l Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- International Broadcasting Convention (IBC)
- Society of Motion Pictrures and Television Engineers (SMPTE)
- SPIE conference on Visual Comminications and Image Processing
- END ---------------------- CUT HERE --------------------- 2/6
-
-