home *** CD-ROM | disk | FTP | other *** search
Text File | 1996-11-10 | 54.9 KB | 1,306 lines |
- Archive-name: mpeg-faq/part3
- Last-modified: 1996/06/02
- Version: v 4.1 96/06/02
- Posting-Frequency: bimonthly
-
- frame
- Field predicted
- 1. a low-cost encoder which only possesses frame
- motion estimation may use dct_type to decorrelate
- the prediction error of a prediction which is
- inherently field by characteristic
-
- 2. an intelligent encoder realizes that it is more bit
- efficient to signal frame prediction with field
- dct_type for the prediction error, than it is to signal
- a field prediction.
-
- field
- Field predicted
- A typical scenario. A field prediction tends to form a
- field-correlated prediction error.
-
- frame
- Frame predicted
- A typical scenario. A frame prediction tends to form a
- frame-correlated prediction error.
-
- field
- Frame predicted
- Makes little sense. If the encoder went through the
- trouble of finding a field prediction in the first place,
- why select frame organization for the prediction error?
-
-
- prediction modes now include field, frame, Dual Prime, and 16x8 MC.
- The combinations for Main Profile and Simple Profile are shown below.
-
- Frame pictures
- motion_type
- motion
- vectors
- per MB
- fundamental
- prediction block
- size (after half-
- pel)
- interpretation
-
- Frame
- 1
- 16x16
- same as MPEG-1, with possibly different
- treatment of prediction error via dct_type
-
- Field
- 2
- 16x8
- Two independently coded predictions are
- made: one for the 8 lines which correspond
- to the top field, another for the 8 bottom
- field lines.
-
- Dual Prime
- 1
- 16x8
- Two independently coded predictions are
- made: one for the 8 lines which correspond
- to the top field, another for the 8 bottom
- field lines. Uses averaging of two 16x8
- prediction blocks from fields of opposite
- parity to form a prediction for the top and
- bottom 8 lines. A second vector is derived
- from the first vector coded in the bitstream.
-
-
-
- Field pictures
- motion_type
- motion
- vectors
- per MB
- fundamental
- prediction block
- size (after half-
- pel)
- interpretation
-
- Field
- 1
- 16x16
- same as MPEG-1, with possibly different
- treatment of prediction error via dct_type
-
- 16x8
- 2
- 16x8
- Two independently coded predictions are
- made: one for the 8 lines which correspond
- to the top field, another for the 8 bottom
- field lines.
-
- Dual Prime
- 1
- 16x16
- A single prediction is constructed from the
- average of two 16x16 predictions taken from
- fields of opposite parity.
-
-
-
- concealment motion vectors can be transmitted in the headers of intra
- macroblocks to help error recovery. When the macroblock data that the
- concealment motion vectors are intended for becomes corrupt, these
- vectors can be used to specify a concealment 16x16 area to be extracted
- from the previous picture. These vectors do not affect the normal
- decoding process, except for motion vector predictions.
-
- Additional chroma_format for 4:2:2 and 4:4:4 pictures. Like MPEG-1,
- Main Profile syntax is strictly limited to 4:2:0 format, however, the
- 4:2:2 format is the basis of the 4:2:2 Profile (aka Studio Profile).
- In 4:2:2 mode, all syntax essentially remains the same except where
- matters of block count are concerned. A coded_block_pattern extension
- was added to handle signaling of the extra two prediction error
- blocks. The 4:4:4 format is currently undefined in any Profile.
-
- chroma_format
- multiplex order within Macroblock
- Application
-
- 4:2:0 (6 blocks)
- YYYYCbCr
- main stream television, consumer entertainment.
-
- 4:2:2 (8 blocks)
- YYYYCbCrCbCr
- studio production environments, professional
- editing equipment, distribution and servers
-
- 4:4:4 (12 blocks)
- YYYYCbCrCbCrCbCrCbCr
- computer graphics
-
-
-
- Non-linear macroblock quantization was introduced in MPEG-2 to increase
- the precision of quantization at high bit rates, while increasing the
- dynamic range for low bit rate use where larger step size is needed.
- The quantization_scale_code may be selected between a linear (MPEG-1
- style) or non-linear scale on a picture (frame or field) basis. The new
- non-linear range corresponds to a dynamic range of 0.5 to 54 with
- respect to the linear (MPEG-1 style) range of 1 to 31.
-
-
- Block:
-
- alternate scan introduced a new run-length entropy scanning pattern
- generally more efficient for the statistics of interlaced video
- signals. Zig-zag scan is the appropriate choice for progressive
- pictures.
-
- intra_dc_precision: the MPEG-1 DC value is mandatory quantized to a
- precision of 8 bits. MPEG-2 introduced 9, 10, and 11 bit precision set
- on a picture basis to increase the accuracy of the DC component, which
- by very nature, has the most significant contribution towards picture
- quality. Particularly useful at high bit rates to reduce
- posterization. Main and Simple Profiles are limited to 8, 9, or 10 bits
- of precision. The 4:2:2 High Profile, which is geared towards higher
- bitrate applications (up to 50 Mbits/sec), permits all values (up to 11
- bits).
-
- separate quantization matrices for Y and C: luminance (Y) and
- chrominance (Cb,Cr) share a common intra and non-intra DCT coefficient
- quantization 8x8 matrix in MPEG-1 and MPEG-2 Main and Simple Profiles.
- The 4:2:2 Profile permits separate quantization matrices to be
- downloaded for the luminance and chrominance blocks. Cb and Cr still
- share a common matrix.
-
- intra_vlc_format: one of two tables may now be selected at the picture
- layer for variable length codes (VLCs) of AC run-length symbols in
- Intra blocks. The first table is identical to that specified for
- MPEG-1 (dc_coef_next). The newer second table is more suited to the
- statistics of Intra coded blocks, especially in I- frames. The best
- illustration between Table 0 and Table 1is the length of the symbol
- which represents End of Block (EOB). In Table zero, EOB is 2 bits. In
- Table one, it is 4 bits. The implication is that the EOB symbol is
- 2^-n probable within the block, or from an alternative perspective,
- there are an average of 3 to 4 non-zero AC coefficients in Non-intra
- blocks, and 9 to 16 coefficients in Intra blocks. The VLC tree of
- Table 1 was intended to be a subset of Table 0, to aid hardware
- implementations. Both tables have 113 VLC entries (or events).
-
- escape: When no entry in the VLC exists for a AC Run-Level symbol, an
- escape code can be used to represent the symbol. Since there are only
- 63 positions within an 8x8 block following the first coefficient, and
- the dynamic range of the quantized DCT coefficients is [-2047,+2048],
- there are (63*2047), or 128,961 possible combinations of Run and Level
- (the sign bit of the Level follows the VLC). Only the 113 most common
- Run-Level symbols are represented in Table 0 or Table 1. The length of
- the escape symbol (which is always 6 bits) plus the Run and Level
- values in MPEG-1 could be 20 or 28 bits in length. The 20 bit escape
- describes levels in the range [-127,+127]. The 28 bit double escape
- has a range of [-255, +255]. MPEG-2 increased the span to the full
- dynamic range of quantized IDCT coefficients, [-2047, +2047] and
- simplified the escape mechanism with a single representation for this
- event. The total length of the MPEG-2 escape codeword is 24 bits (6
- bit VLC followed by a 6-bit Run value, and 12 bit Level value). It was
- an assumption by MPEG-1 designers that no quantized DCT coefficient
- would need greater representation than 10 bits [-255,+255]. Note:
- MPEG-2 escape mechanism does not permit the value -2048 to be
- represented.
-
- mismatch control: The arithmetic results of all stages are defined
- exactly by the normative MPEG decoding process, with the single
- exception of the Inverse Discrete Cosine Transform (IDCT). This stage
- can be implemented with a wide variety of IDCT implementations. Some
- are more suited for software, others for programmable hardware, and
- others still for hardwired hardware designs. The IDCT reference formula
- in the MPEG specification would, if directly implemented, consume at
- least 1024 multiply and 1024 addition operations for every block. A
- wide variety of fast algorithms exist which can reduce the count to
- less than 200 multiplies and 500 adds per block by exploiting the
- innate symmetry of the cosine basis functions. A typical fast IDCT
- algorithm would be dwarfed by the cost of the other decoder stages
- combined. Each fast IDCT algorithm has different quantization error
- statistics (fingerprint), although subtle when the precision of the
- arithmetic is, for example, at least 16-bits for the transform
- coefficients and 24-bits for intermediate dot product values.
- Therefore, MPEG cannot standardize a single fast IDCT algorithm. The
- accuracy can be defined only statistically. The IEEE 1180
- recommendation (December 1990) defines the error tolerance between an
- ideal direct-matrix floating point implementation (a direct
- implementation of the MPEG reference formula) and the test IDCT.
-
- Mismatch control attempts to reduce the drift between different IDCT
- algorithms by eliminating bit patterns which statistically have the
- greatest contribution towards mismatches between the variety of
- methods. The reconstructions of two decoders will begin to diverge over
- time since their respective IDCT designs will reconstruct occasional,
- slightly different 8x8 blocks.
-
- MPEG-1s mismatch control method is known canonicially as Oddification,
- since it forces all quantized DCT coefficients to negative values. It
- is a slight improvement over its predecessor in H.261. MPEG-2 adopted
- a different method called, again canonically, LSB Toggling, further
- reducing the likelihood of mismatch. Toggling affects only the Least
- Significant Bit (LSB) of the 63rd AC DCT coefficient (the highest
- frequency in the DCT matrix). Another significant difference between
- MPEG-1 and MPEG-2 mismatch control is, in MPEG-1, oddification is
- performed on the quantized DCT coefficients, whereas in MPEG-2,
- toggling is performed on the DCT coefficients after inverse
- quantization. MPEG-1s mismatch control method favors programmable
- implementation since a block of DCT coefficients when quantized.
-
- Sample:
- The two chrominace pictures (Cb, Cr) possess only half the resolution
- in both the horizontal and vertical direction as the luminance picture
- (Y). This is the definition of the 4:2:0 chroma format. Most
- television displays require that at least the vertical chrominance
- resolution matches the luminance (4:2:2 chroma format). Computer
- displays may further still demand that the horizontal resolution also
- be equivalent (4:4:4 chroma format). There are a variety of filtering
- methods for interpolating the chrominance samples to match the sample
- density of luminance. However, the official location or center of the
- lower resolution chrominance sample should influence the filter design
- (relative taps weights), otherwise the chrominance plane can appear to
- be shifted by a fractional sample in the wrong direction.
-
- The subsampled MPEG-1 chroma position has a center exactly half way
- between the four nearest neighboring luminance samples. To be
- consistent with the subsampled chrominance positions of 4:2:2
- television signals, MPEG-2 moved the center of the chrominance samples
- to be co-located horizontally with the luminance samples.
-
-
- Misc.:
-
- copyright_id extension can identify whether a sequence or subset of
- frames within the sequence is copyrighted, and provides a unique 64-bit
- copyright_id_number registered with the ISO/IEC.
-
- Syntax can now signal frame sizes as large as 16383 x 16383. Since
- MPEG-1 employed a meager 12-bits to describe horizontal_size and
- vertical_size , the range was limited to 4095x4095. However, MPEGs
- Levels prescribe important interoperability points for practical
- decoders. Constrained Parameters MPEG-1 and MPEG-2 Low Level limit the
- sample rate to 352x240x30 Hz. MPEG-2s Main Level defines the limit at
- 720x480x30 Hz. Of course, this is simply the restriction of the dot
- product of horizontal_size, vertical_size, and frame_rate. The Level
- also places separate restrictions on each of the these three
- variables.
-
- Reflecting the more television oriented manner of MPEG-2, the optional
- sequence_display_extension() header can specify the chromaticy of the
- source video signal as it was prior to representation by MPEG syntax.
- This information includes: whether the original video_format was
- composite or component, the opto-electronic transfer_characteristics,
- and RGB->YCbCr matrix_coefficients. The picture_display_extension()
- provides more localized source composite video characteristics on a
- frame by frame basis (not field-by-field), with the syntax elements:
- field_sequence, sub_carrier_phase, and burst_amplitude. This
- information can be used by the displays post-processing stage to
- reproduce a more refined display sequence.
-
- Optional pan & scan syntax was introduced which tells a decoder on a
- frame-by-frame basis how to, for example, window a 4:3 image within the
- wider 16:9 aspect ratio of the coded frame. The vertical pan offset
- can be specified to within 1/16th pixel accuracy.
-
- <IMG SRC="mpeg2pan.gif">
-
-
- How does MPEG syntax facilitate parallelism ?
-
- For MPEG-1, slices may consist of an arbitrary number of macroblocks.
- They can be independently decoded once the picture header side
- information is known. For parallelism below the slice level, the coded
- bitstream must first be mapped into fixed-length elements. Further,
- since macroblocks have coding dependencies on previous macroblocks
- within the same slice, the data hierarchy must be pre-processed down to
- the layer of DC DCT coefficients. After this, blocks may be
- independently inverse transformed and quantized, temporally predicted,
- and reconstructed to buffer memory. Parallelism is usually more of a
- concern for encoders. In many encoders today, block matching (motion
- estimation) and some rate control stages (such as activity and/or
- complexity measures) are processed for macroblocks independently.
- Finally, with the exception that all macroblock rows in Main Profile
- MPEG-2 bitstreams must contain at least one slice, an encoder has the
- freedom to choose the slice structure.
-
- What is the MPEG color space and sample precision?
-
- MPEG strictly specifies the YCbCr color space, not YUV or YIQ or YPbPr
- or YDrDb or any other many fine varieties of color difference spaces.
- Regardless of any bitstream parameters, MPEG-1 and MPEG-2 Video Main
- Profile specify the 4:2:0 chroma_format, where the color difference
- channels (Cb, Cr) have half the "resolution" or sample grid density in
- both the horizontal and vertical direction with respect to luminance.
-
- MPEG-2 High Profile includes an option for 4:2:2 chroma_format, as does
- the MPEG 4:2:2 Profile (a.k.a. Studio Profile) naturally. Applications
- for the 4:2:2 format can be found in professional broadcasting,
- editing, and contribution-quality distribution environments. The
- drawback of the 4:2:2 format is simply that it increases the size of
- the macroblock from six 8x8 blocks (4:2:0) to eight, while increasing
- the frame buffer size and decoding bandwidth by the same amount (33
- %). This increase places the buffering memories well past the magic
- 16-Mbit limit for semiconductor DRAM devices, assuming the pictures are
- stored with a maximum of 414,720 pixels (720 pixels/line x 576
- lines/frame). The maximum allowable pixel resolution could be reduced
- by 1/3 to compensate (e.g. 544 x 576). However, if a hardware decoders
- operate on a macroblock basis in the pipeline, on-chip static memories
- (SRAM) will increase by 1/3. The benefits offered by 1/3 more pixels
- generally outweighs full vertical chrominance resolution. Other
- arguments favoring 4:2:0 over 4:2:2 include:
-
- Vertical decimation increases compression efficiency by reducing
- syntax overhead posed in an 8 block (4:2:2) macroblock structure.
-
- You're compressing the hell out of the video signal, so what possible
- difference can the 0:0:2 chromiance high-pass make?
-
- Is 4:2:0 the same as 4:1:1 ?
-
- No, no, definitely no. The following table illustrates the nuances
- between the different chroma formats for a frame with pixel dimensions
- of 720 pixels/line x 480 lines/frame.
-
- CCIR 601 (60 Hz) image Chroma sub-sampling factors
- format Y Cb, Cr Vertical Horizontal
-
-
- chroma
- format
- pixels/
- line
- Y
- lines/
- frame
- Y
- pixels/
- line
- Cb, Cr
- lines/
- frame
- Cb, Cr
- horizontal
- subsampling
- factor
- vertical
- subsampling
- factor
-
- 4:4:4
- 720
- 480
- 720
- 480
- none
- none
-
- 4:2:2
- 720
- 480
- 360
- 480
- 2:1
- none
-
- 4:2:0
- 720
- 480
- 360
- 240
- 2:1
- 2:1
-
- 4:1:1
- 720
- 480
- 180
- 480
- 4:1
- none
-
- 4:1:0
- 720
- 480
- 180
- 120
- 4:1
- 4:1
-
-
- 3:2:2, 3:1:1, and 3:1:0 are less common variations, but have been
- documented. As shocking as it may seem, the 4:1:0 ratio was used by
- Intels DVI for several years.
-
- The 130 microsecond gap between successive 4:2:0 lines in progressive
- frames, and 260 microsecond gap in interlaced frames, can introduce
- some difficult vertical frequencies, but most can be alleviated through
- pre- processing.
-
- What is the sample precision of MPEG ? How many colors
- can MPEG represent ?
-
- By definition, MPEG samples have no more and no less than 8-bits
- uniform sample precision (256 quantization levels). For luminance
- (which is unsigned) data, black corresponds to level 0, white is level
- 255. However, in CCIR recommendation 601 chromaticy, luminance (Y)
- levels 0 through 14 and 236 through 255 are reserved for blanking
- signal excursions. MPEG currently has no such clipped excursion
- restrictions, although decoder might take care to insure active samples
- do not exceed these limits. With three color components per pixel, the
- total combination is roughly 16.8 million colors (i.e. 24-bits).
-
-
- How are the subsampled chroma samples cited ?
-
-
- It is moderately important to properly co-site chroma samples,
- otherwise a sort of chroma shifting effect (exhibited as a halo) may
- result when the reconstructed video is displayed. In MPEG-1 video, the
- chroma samples are exactly centered between the 4 luminance samples
- (Fig 1.) To maintain compatibility with the CCIR 601 horizontal
- chroma locations and simplify implementation (eliminate need for phase
- shift), MPEG-2 chroma samples are arranged as per Fig.2.
-
- Y Y Y Y Y Y Y Y YC Y YC Y
- C C C C
- Y Y X Y Y Y Y Y YC Y YC Y
-
- Y Y Y Y Y Y Y Y YC Y YC Y
- C C C C
- Y Y Y Y Y Y Y Y YC Y YC Y
-
- Fig.1 MPEG-1 Fig.2 MPEG-2 Fig.3 MPEG-2 and
- 4:2:0 organization 4:2:0 organization CCIR Rec. 601
- 4:2:2 organization
-
-
- How do you tell an MPEG-1 bitstream from an MPEG-2
- bitstream ?
-
- A. All MPEG-2 bitstreams must contain specific extension headers that
- immediately follow MPEG-1 headers. At the highest layer, for example,
- the MPEG-1 style sequence_header() is followed by sequence_extension().
- Some extension headers are specific to MPEG-2 profiles. For example,
- sequence_scalable_extension() is not allowed in Main Profile
- bitstreams.
-
- A simple program need only scan the coded bitstream for byte-aligned
- start codes to determine whether the stream is MPEG-1 or MPEG-2.
-
- What are start codes?
-
- These 32-bit byte-aligned codes provide a mechanism for cheaply
- searching coded bitstreams for commencement of various layers of video
- without having to actually parse variable-length codes or perform any
- decoder arithmetic. Start codes also provide a mechanism for
- resynchronization in the presence of bit errors. A start code may be
- preceded by an arbitrary number of zero bytes. The zero bytes can be
- use to guarantee that a start code occurs within a certain location, or
- by rate control to increase the bitrate of a coded bitstream.
-
- Coded block pattern
-
- Coded block pattern:
- (CBP --not to be confused with Constrained Parameters!) When the frame
- prediction is particularly good, the displaced frame difference(DFD, or
- temporal macroblock prediction error) tends to be small, often with
- entire block energy being reduced to zero after quantization. This
- usually happens only at low bit rates. Coded block patterns prevent
- the need for transmitting EOB symbols in those zero coded blocks.
- Coded block patterns are transmitted in the macroblock header only if
- the macrobock_type flag indicates so.
-
- Why is the DC value always divided by 8 ?
-
- Clarification point: The DC value of Intra coded blocks is quantized by
- a constant stepsize of 8 only in MPEG-1, rendering the 11-bit dynamic
- range of the IDCT DC coefficient to 8-bits of accuracy. MPEG-2 allows
- for DC precision of 8, 9, 10, or 11 bits. The quantization stepsize is
- fixed for the duration of the picture, set by the intra_dc_precision
- flag in the picture_extension_header().
-
- Why is there a special VLC for DCT_coefficient_first:?
-
- Since the coded_block_pattern in NON-INTRA macroblocks signals every
- possible combination of all-zero valued and non-zero blocks, the
- dct_coef_first mechanism assigns a different meaning to the VLC
- codeword (run = 0, level =+/- 1) that would otherwise represent EOB
- (10) as the first coefficient in the zig-zag ordered Run-Level token
- list.
-
- WhatÆs the deal with End of Block ?
-
- Saves unnecessary run-length codes. At optimal bitrates, there tends
- to be few AC coefficients concentrated in the early stages of the
- zig-zag vector. In MPEG-1, the 2-bit length of EOB implies that there
- is an average of only 3 or 4 non-zero AC coefficients per block. In
- MPEG-2 Intra (I) pictures, with a 4-bit EOB code in Table 1, this
- estimate is between 9 and 16 coefficients. Since EOB is required for
- all coded blocks, its absence can signal that a syntax error has
- occurred in the bitstream.
-
- WhatÆs this ôMacroblock stuffing,ö dammit ?:
-
- A genuine pain for VLSI implementations, macroblock stuffing was
- included in MPEG-1 to maintain smoother, constant bitrate control for
- encoders. However, with normalized complexity/activity measures and
- buffer management performed a priori (before coding of the macroblock,
- for example) and local monitoring of coded data buffer levels now a
- common operation in encoders, (e.g. MPEG-2 encoder Test Model), the
- need for such localized bitrate smoothing evaporated. Stuffing can be
- achieved through slice start code padding if required. A good rule of
- thumb is: if you find often yourself wishing for stuffing more than
- once per slice, you probably don't have a very good rate control
- algorithm. Nonetheless, to avoid any temptation, macroblock stuffing
- is now illegal in MPEG-2 (A general syntax restriction brought to you
- by the Implementation Studies Subgroup!)
-
- WhatÆs the deal with slice_vertical_position and
- macroblock_address_increment?
-
- The absolute position of the first macroblock within a slice is known
- by the combination of slice_vertical_position and the
- macroblock_address_increment. Therefore, the proper place of a lost
- slice found in a highly corrupt bitstream can be located exactly within
- the picture. These two syntax elements are also the only known means
- of detecting slice gaps----areas of the picture which are not
- represented with any information (including skipped macroblocks). A
- slice gap occurs when the current macroblock address of the first
- macroblock in a slice is greater than the previous macroblock address
- by more than 1 macroblock unit. A slice overlap occurs when the current
- macroblock address is less than or equal to the previous macroblocks
- address. The previous macroblock in both instances is the last known
- macroblock within the previous slice. Because of the semantic
- interpretation of slice gaps and overlaps, and because of the syntactic
- restrictions for slice_vertical_position and
- macroblock_address_increment, it is not syntactically possible for a
- skipped macroblock to be represented in the first and last positions of
- a slice. In the past, some (bad) encoders would attempt to signal a
- run of skipped macroblocks to the end of the slice. These evil skipped
- macroblocks should be interpreted by a compliant decoder as a gap, not
- as a string of skipped macroblocks.
-
- What is meant by modified Huffman VLC tables:
-
- The VLC tables in MPEG are not Huffman tables in the true sense of
- Huffman coding, but are more like the tables used in Group 3 fax. They
- are entropy constrained, that is, non-downloadable and optimized for a
- limited range of bit rates (sweet spots). A better way would be to say
- that the tables are optimized for a range of ratios of bit rate to
- sample rate (e.g. 0.25 bits/pixel to 1.0 bits/pixel). With the
- exception of a few codewords, the larger tables were carried over from
- the H.261 standard drafted in the year 1990. This includes the AC
- run-level symbols, coded_block_pattern, and macroblock_address_increment.
- MPEG-2 added an "Intra table," also called "Table 1". Note that the
- dct_coefficient tables assume positive/negative coefficient PMF
- symmetry.
-
-
- How does MPEG handle 3:2 pulldown?
-
- MPEG-1 video decoders had to decide for themselves when to perform 3:2
- pulldown if it was not indicated in the presentation time stamps (PTS)
- of the Systems layer bitstream. MPEG-2 provides two flags
- (repeat_first_field, and top_field_first) which explicitly describe
- whether a frame or field is to be repeated. In progressive sequences,
- frames can be repeated 2 or 3 times. Simple and Main Profile limit are
- limited to repeated fields only. It is a general syntactic restriction
- that repeat_first_field can only be signaled (value ==1) in a frame
- structured picture. It makes little sense to repeat field pictures in
- an interlaced video signal since the whole process of 3:2 pulldown
- conversion was meant to convert progressive, film sequences to the
- display frame rate of interlaced television.
-
- In the most common scenario, a film sequence will contain 24 frames
- every second. The bit_rate element in the sequence header will
- indicate 30 frames/sec, however. On average, every other coded frame
- will signal a repeat field (repeat_first_field==1) to pad the frame
- rate from 24 Hz to 30 Hz:
-
-
- (24 coded frames/sec)*(2 fields/coded frame)*(5 display fields/4 coded
- fields) = 30 display frames/sec
-
-
- After all this standardization, whatÆs left for research?
-
-
- A . Despite the fact that a comprehensive worldwide standard now exists
- for digital video, many areas remain wide open for research: advanced
- encoding and pre-processing, motion estimation, macroblock decision
- models, rate control and buffer management in editing environments,
- implementation complexity reduction, etc. Many areas have yet to be
- solved ... (and discovered)..
-
- Are some encoders better than others ?
-
- A. Definitely. For example, the motion estimation search range of a
- has great influence over final picture quality. At a certain point a
- very large range can actually become detrimental (it may encourage
- large differential motion vectors). Practical ranges are usually
- between +/- 15 and +/- 32. As the range doubles, for instance, the
- search area quadruples. (like the classic relationship between in
- increase in linear vs. area).
-
- Rate control marks a second tell-tale area where some encoders perform
- significantly better than others.
-
- And finally, the degree of "pre-processing" (now a popular buzzword in
- the business) signals that the encoder belongs to an elite marketing
- class.
-
-
- Is the encoder standardized ?
-
- A. The encoder rests just outside the normative scope of the standard,
- as long as the bitstreams it produces are compliant. The decoder,
- however, is almost deterministic: a given bitstream should reconstruct
- to a unique set of pictures. However, since the IDCT function is the
- ONLY non-normative stage in the decoder, an occasional error of a Least
- Significant Bit per prediction iteration is permitted. The designer is
- free to choose among many DCT algorithms and implementations. The IEEE
- 1180 test referenced in Annex A of the MPEG-1 (ISO/IEC 11172-2) and
- MPEG-2 (ISO/IEC 13818-2) Video specifications spells out the
- statistical mismatch tolerance between the Reference IDCT, which is a
- separable 8x1 "Direct Matrix" DCT implemented with 64-bit floating
- point accuracy, and the IDCT you are testing for compliance.
-
-
- What is the TM (Test Model) ?
- What is the TM rate control and adaptive quantization technique ?
-
- A. The Test model (MPEG-2) and Simulation Model (MPEG-1) were not, by
- any stretch of the imagination, meant to epitomize state-of-the art
- encoding quality. They were, however, designed to exercise the syntax,
- verify proposals, and test the relative compression performance of
- proposals in a timely manner that could be duplicated by
- co-experimenters. Without simplicity, there would have been no doubt
- endless debates over model interpretation. Regardless of all else,
- more advanced techniques would probably trespass into proprietary
- territory.
-
- The final test model for MPEG-2 is TM version 5b, a.k.a. TM version 6,
- produced in March 1993 (the time when the MPEG-2 video syntax was
- frozen). The final MPEG-1 simulation model is version 3 (SM-3). The
- MPEG-2 TM rate control method offers a dramatic improvement over the SM
- method. TM adds more accurate estimation of macroblock complexity
- through use of limited a priori information. Macroblock quantization
- adjustments are computed on a macroblock basis, instead of
- once-per-macroblock row (which in the SM-3 case consisted of an entire
- slice).
-
- How does the TM work?
-
- Rate control and adaptive quantization are divided into three steps:
-
- Step One: Target Bit Allocation
-
- In Complexity Estimation, the global complexity measures assign
- relative weights to each picture type (I,P,B). These weights (Xi, Xp,
- Xb) are reflected by the typical coded frame size of I, P, and B
- pictures (see typical frame size discussion). I pictures are usually
- assigned the largest weight since they have the greatest stability
- factor in an image sequence and contain the most new information in a
- sequence. B pictures are assigned the smallest weight since B energy
- do not propagate into other pictures and are usually more highly
- correlated with neighboring P and I pictures than P pictures are.
-
- The bit target for a frame is based on the frame type, the remaining
- number of bits left in the Group of Pictures (GOP) allocation, and the
- immediate statistical history of previously coded pictures (sort of a
- moving average global rate control, if you will).
-
- Step Two: Rate Control via Buffer Monitoring
-
- Rate control attempts to adjust bit allocation if there is significant
- difference between the target bits (anticipated bits) and actual coded
- bits for a block of data. If the virtual buffer begins to overflow,
- the macroblock quantization step size is increased, resulting in a
- smaller yield of coded bits in subsequent macroblocks. Likewise, if
- underflow begins, the step size is decreased. The Test Model
- approximates that the target picture has spatially uniform distribution
- of bits. This is a safe approximation since spatial activity and
- perceived quantization noise are almost inversely proportional. Of
- course, the user is free to design a custom distribution, perhaps
- targeting more bits in areas that contain more complex yet highly
- perceptible data such as text.
-
- Step Three: Adaptive Quantization
-
- The final step modulates the macroblock quantization step size obtained
- in Step 2 by a local activity measure. The activity measure itself is
- normalized against the most recently coded picture of the same type (I,
- P, or B). The activity for a macroblock is chosen as the minimum among
- the four 8x8 block luminance variances. Choosing the minimum block is
- part of the concept that a macroblock is no better than the block of
- highest visible distortion (weakest link in the chain).
-
- Decision:
- [deferred to later date]
-
- Can motion vectors be used to determine object velocity?
-
- Motion vector information cannot be reliably used as a means of
- determining object velocity unless the encoder model specifically set
- out to do so. First, encoder models that optimize picture quality
- generate vectors that typically minimize prediction error and,
- consequently, the vectors often do not represent true object
- translation from picture-to-picture. Standards converters that
- resample one frame rate to another (as in NTSC to PAL) use different
- methods (motion vector field estimation, edge detection, et al) that
- are not concerned with Rate-Distortion theory. Second, motion vectors
- are not transmitted for all macroblocks anyway.
-
- Is it possible to code interlaced video with MPEG-1 syntax?
-
- A. Two methods can be applied to interlaced video that maintain
- syntactic compatibility with MPEG-1 (which was originally designed for
- progressive frames only). In the field concatenation method, the
- encoder model can carefully construct predictions and prediction errors
- that realize good compression but maintain field integrity (distinction
- between adjacent fields of opposite parity). Some pre-processing
- techniques can also be applied to the interlaced source video that
- would, e.g., lessen sharp vertical frequencies.
-
- This technique is not terribly efficient of course. On the other hand,
- if the original source was progressive (e.g. film), then it is more
- trivial to convert the interlaced source to a progressive format before
- encoding. (MPEG-2 would then only offer slightly superior performance
- through such MPEG-2 enhancements as greater DC coefficient precision,
- non-linear mquant, intra VLC, etc.) Reconstructed frames are usually
- re- interlaced in the Display process following the decoding stages.
-
- The second syntactically compatible method codes fields as separate
- pictures. Rumors have spread that this approach does not quiet work
- nearly as well as the pretend its really a frame method.
-
- Can MPEG be used to code still frames ?
-
- Yes. MPEG Intra pictures are similar to baseline sequential JPEG pictures.
-
- There are, of course, advantages and disadvantages to using MPEG over
- JPEG to represent still pictures.
-
- Disadvantages:
-
- 1. MPEG has only one color space (YCbCr)
-
- 2. MPEG-1 and MPEG-2 Main Profile luma and chroma share quanitzation
- and VLC tables (4:2:0 chroma_format)
-
- 3. MPEG-1 is syntactically limited to 4k x 4k images, and 16k x 16k for MPEG-2.
-
- Advantages:
-
- 1. MPEG possesses adaptive quantization which permits better rate
- control and spatial masking.
-
- 2. With its limited still image syntax, MPEG averts any temptation to
- use unnecessary, expensive, and academic encoding methods that have
- little impact on the overall picture quality (you know who you are).
-
- 3. Philips' CD-I spec. has a requirement for a MPEG still frame mode,
- with double SIF image resolution. This is technically feasible mostly
- thanks to the fact that only one picture buffer is needed to decode a
- still image instead of the 2.5 to 3 buffers needed for IPB sequences.
-
-
- Why was the 8x8 DCT size chosen?
-
- A. Experiments showed little compaction gains could be achieved with
- larger transform sizes, especially in light of the increased
- implementation complexity. A fast DCT algorithm will require roughly
- double the number of arithmetic operations per sample when the linear
- transform point size is doubled. Naturally, the best compaction
- efficiency has been demonstrated using locally adaptive block sizes
- (e.g. 16x16, 16x8, 8x8, 8x4, and 4x4) [See Gary Sullivan and Rich
- Baker "Efficient Quadtree Coding of Images and Video," ICASSP 91, pp
- 2661-2664.].
-
- Inevitably, adaptive block transformation sizes introduce additional
- side information overhead while forcing the decoder to implement
- programmable or hardwired recursive DCT algorithms. If the DCT size
- becomes too large, then more edges (local discontinuities) and the like
- become absorbed into the transform block, resulting in wider
- propagation of Gibbs (ringing) and other unpleasant phenomena.
- Finally, with larger transform sizes, the DC term is even more
- critically sensitive to quantization noise.
-
- Why was the 16x16 prediction size chosen?
-
- The 16x16 area corresponds to the Least Common Multiple (LCM) of 8x8
- blocks, given the normative 4:2:0 chroma ratio. Starting with medium
- size images, the 16x16 area provides a good balance between side
- information overhead & complexity and motion compensated prediction
- accuracy. In gist, experiments showed that the 16x16 was a good
- trade-off between complexity and coding efficiency.
-
- What do B-pictures buy you?
-
- A. Since bi-directional macroblock predictions are an average of two
- macroblock areas, noise is reduced at low bit rates (like a 3-D filter,
- if you will). At nominal MPEG-1 video (352 x 240 x 30, 1.15 Mbit/sec)
- rates, it is said that B-frames improves SNR by as much as 2 dB. (0.5
- dB gain is usually considered worth-while in MPEG). However, at higher
- bit rates, B- frames become less useful since they inherently do not
- contribute to the progressive refinement of an image sequence (i.e.
- not used as prediction by subsequent coded frames). Regardless,
- B-frames are still politically controversial.
-
- B pictures are interpolative in two ways: 1. predictions in the
- bi-directional macroblocks are an average from block areas of two
- pictures 2. B pictures "fill in" like a digital spackle the immediate
- 3-D video signal without contributing to the overall signal quality
- beyond that immediate point in time. In other words, a B picture,
- regardless of its internal make-up of macroblock types, has a life
- limited only to itself. As mentioned before, B picture energy does not
- propagate into other frames. In a sense, bits spent on B pictures are
- wasted.
-
- Why do some people hate B-frames?
-
- A. Computational complexity, bandwidth, end-to-end delay, and picture
- buffer size are the four B-frame Pet Peeves. Computational complexity
- in the decoder is increased since some macroblock modes require
- averaging between two block predictions (macroblock_motion_forward==1
- && macroblock_motion_backward==1).
-
- Worst case, memory bandwidth is increased an extra 15.2 MByte/s
- (assuming 4:2:0 chroma_format at Main Level), not including any half
- pel or page-mode overhead) for this extra directional prediction. To
- really rub it in, an extra picture buffer is needed to store the future
- reference picture (backwards prediction frame). Finally, an extra
- picture delay is introduced in the decoder since the frame used for
- backwards prediction needs to be transmitted to the decoder and
- reconstructed before the intermediate B-pictures in display order can
- be decoded.
-
- Cable television have been particularly adverse to B-frames since, for
- CCIR 601 rate video, the extra picture buffer pushes the decoder DRAM
- memory requirements past the magic 8- Mbit (1 Mbyte) threshold into the
- evil realm of 16 Mbits (2 Mbyte).---- although 8-Mbits is fine for 352
- x 480 B picture sequence. However, cable often forgets that DRAM does
- not come in convenient high-volume (low cost) 8- Mbit packages as does
- friendly 4-Mbit and 16-Mbit packages. In a few years, the cost
- difference between 16 Mbit and 8 Mbit will become insignificant
- compared to the bandwidth savings gain through higher compression. For
- the time being, some cable boxes will start with 8-Mbit and allow
- future drop-in upgrades to the full 16-Mbit.
-
-
- How are interlaced and progressive pictures indicated in
- MPEG?
-
- The following tree may help illustrate the possible layers of
- progressive and interlaced coding modes:
-
-
-
- MPEG-2 sequence
- / \
- progressive interlaced sequence
- sequence / \
- Field picture Frame picture
- / \
- / \
- Frame or field prediction Frame MB prediction only
- / \
- Field dct Frame dct
-
-
-
- What does it mean to be compliant with MPEG ?
-
- There are two areas of conformance/compliance in MPEG:
-
- 1. Compliant bitstreams
- 2. Compliant decoders
-
- Technically speaking, video bitstreams consisting entirely of I-frames
- are syntactically compliant with the MPEG specification. The I-frame
- sequence simply utilizes a rather limited subset of the full syntax.
- Compliant bitstreams must obey the range limits (e.g. motion vectors
- ranges, bit rates, frame rates, buffer sizes) and permitted syntax
- elements in the bitstream (e.g. chroma_format, B-pictures, etc).
-
- Decoders, however, must be able to decode all combinations of legal
- bitstreams.. For example, a decoder which is incapable of decoding P or
- B frames is definitely not a Main Profile or Constrained Parameters
- decoder! Likewise, full arithmetic precision must be obeyed before any
- decoder can be called "MPEG compliant." The IDCT, inverse quantizer,
- and motion compensated predictor must meet the accuracy requirements
- defined in the MPEG document. Real-time conformance is more complicated
- to measure than arithmetic precision, but it reasonable to expect that
- decoders that skip frames on reasonable bitstreams are not likely to be
- considered compliant.
-
- What are Profiles and Levels?
-
- A. MPEG-2 Video Main Profile and Main Level is analogous to MPEG-1's
- CPB, with sampling limits at CCIR 601 parameters (720x480x30 Hz or
- 720x576x24 Hz). "Profiles" limit syntax (i.e. algorithms), whereas
- "Levels" limit coding parameters (sample rates, frame dimensions, coded
- bitrates, etc.). Together, Video Main Profile and Main Level
- (abbreviated as MP@ML) normalize complexity within feasible limits of
- 1994 VLSI technology (0.5 micron), yet still meet the needs of the
- majority of applications. MP@ML is the conformance point for most cable
- and satellite TV systems.
-
- [insert a description of each Profiles and Levels here]
-
- Can MPEG-1 encode higher sample rates than 352 x 240 x 30 Hz ?
-
- A. Yes. The MPEG-1 syntax permits sampling dimensions as high as 4095 x
- 4095 x 60 frames per second. The MPEG most people think of as "MPEG-1"
- is really a kind of subset known as Constrained Parameters bitstream
- (CPB).
-
- What are Constrained Parameters Bitstreams?
-
- MPEG-1 CPB are a limited set of sampling and bitrate parameters
- designed to normalize decoder computational complexity, buffer size,
- and memory bandwidth while still addressing the widest possible range
- of applications. The parameter limits were intentionally designed to
- permit decoder implementations integrated with 4 Megabits (512 Kbytes)
- of DRAM.
-
- Bitstream Parameter
- Limit
-
- pixels/line
- 704
-
- lines/frame
- 480 or 576
-
- pixels/frame
- 101,376 pixels
-
- pixels/second
- 2,534,400
-
- frames/sec
- 30 Hz
-
- bit rate
- 1.86 Mbit/sec
-
- buffer size
- 40 Kbytes
-
-
- The sampling limits of CPB are bounded at the ever popular SIF rate:
- 396 macroblocks (101,376 pixels) per picture if the picture rate is
- less than or equal to 25 Hz, and 330 macroblocks (84,480 pixels) per
- picture if the picture rate is 30 Hz. The MPEG nomenclature loosely
- defines a pixel or "pel" as a unit vector containing a complete
- luminance sample and one fractional (0.25 in 4:2:0 format) sample from
- each of the two chrominance (Cb and Cr) channels. Thus, the
- corresponding bandwidth figure can be computed as:
-
- 352 samples/line x 240 lines/picture x 30 pictures/sec x 1.5
- samples/pixel
-
- or 3.8 Ms/s (million samples/sec) including chroma, but not including
- blanking intervals. Since most decoders are capable of sustaining VLC
- decoding at a faster rate than 1.8 Mbit/sec, the coded video bitrate
- has become the most often waived parameter of CPB. An encoder which
- intelligently employs the syntax tools should achieve SIF quality
- saturation at about 2 Mbit/sec, whereas an encoder producing streams
- containing only I (Intra) pictures might require as much as 8 Mbit/sec
- to achieve the same video quality.
-
- Why is Constrained Parameters so important?
-
- A. It is an optimum point that allows (just barely) cost effective
- VLSI implementations in 1992 technology (0.8 microns). It also
- implies a nominal guarantee of interoperability for decoders and a
- reasonable class of performance for encoders. Since CPB is the most
- popular canonical MPEG-1 conformance point, MPEG devices which are not
- capable of at least meeting SIF rates are usually not considered to be
- true MPEG by industry.
-
- Picture buffers (i.e. "frame stores") and coded data buffering
- requirements for MPEG-1 CPB fit just snugly into 4 Mbit of memory
- (DRAM).
-
- Who uses constrained parameters bitstreams?
-
- A. Principal CPB applications are Compact Disc video (White Book or
- CD-I) and desktop video. Set-top TV decoders fall into a higher
- sampling rate category known as "CCIR 601" or "Broadcast rate," which
- as a rule of thumb, has sampling dimensions and bandwidth 4 times
- that of SIF (Constrained Parameter sample rate limit).
-
- Are there ways of circumventing constrained parameters bitstreams for
- SIF class applications and decoders ?
-
- A. Yes, some. Remember that CPB limits pictures by macroblock count
- (or pixels/frame). 416 x 240 x 24 Hz sampling rates are still within
- these constraints. Deviating from 352 samples/line could throw off many
- decoder implementations which possess limited horizontal sample rate
- conversion abilities. Some decoders do in fact include a few rate
- conversion modes, with a filter usually implemented via binary taps
- (shifts and adds). Likewise, the target sample rates are usually
- limited or ratios (e.g. 640, 540, 480 pixels/line, etc.). Future MPEG
- decoders will likely include on-chip arbitrary sample rate converters,
- perhaps capable of operating in the vertical direction (although there
- is little need of this in applications using standard TV monitors where
- line count is constant, with the possible exception of windowing in
- cable box graphical user interfaces).
-
- Also, many CD videos are letterboxed at the 16:9 aspect ratio. The
- actual coded and display sampling dimensions are 384 x 216 (note
- 384/216 = 16/9). These programs are typically movies coded at the more
- manageable 24 frames/sec.
-
- Are there any other conformance points like CPB for MPEG-1?
-
- A. Undocumented ones, yes. A second generation of decoder chips
- emerged on the market about 1 year after the first wave of SIF-class
- decoders. Both LSI Logic and SGS-Thomson introduced CCIR 601 class
- MPEG-1 video decoders to fill in the gap between canonical MPEG-1 (SIF)
- and the emergence of Main Profile at Main Level (CCIR 601) MPEG-2
- decoders. Under non-disclosure agreement, C-Cube had the CL- 950,
- although since Q2'94, the CL-9100 is now the full MPEG-2 successor in
- production. MPEG-1 decoders in the CCIR 601 class, or Main Level, were
- all too often called MPEG-1.5 or MPEG-1++ decoders. For the first year
- of operation, the Direct Broadcasting Satellite service in the United
- States (Hughes Direct TV and Hubbards USSB) called only upon MPEG-1
- syntax to represent interlaced video before switching to full MPEG-2
- syntax.
-
- What frame rates are permitted in MPEG?
-
- A limited set is available for the choosing in MPEG-1 and the currently
- defined set of Profiles and Levels of MPEG-2, although "tricks" could
- be played with Systems-layer Time Stamps to convey non-standard picture
- rates. The set is: 23.976 Hz (3-2 pulldown NTSC), 24 Hz (Film), 25 Hz
- (PAL/SECAM or 625/60 video), 29.97 (NTSC), 30 Hz (drop-frame NTSC or
- component 525/60), 50 Hz (double-rate PAL), 59.97 Hz (double rate
- NTSC), and 60 Hz (double-rate, drop-frame NTSC/component 525/60
- video).
-
- Only 23.976, 24, 25, 29.97, and 30 Hz are within the conformance space
- of Constrained Parameter Bitstreams and Main Level.
-
-
- What areas can be improved upon to create a better syntax
- than MPEG?
-
- Several improvements can be made to the MPEG syntax while remaining
- within the framework of block based coding. As implementation
- technology improves with time, the ratio of computation to sample rate
- can be increased for the same implementation cost. With each
- evolutionary stage in the shrinking of the semiconductor lithography
- process (line width), more complex coding methods become economically
- realizable. Some of the well-known or well-anticipated areas for
- improvement are described below:
-
- Intra coding:
- For intra pictures, subband methods such as wavelets combined with
- improved quantization and entropy coders could gain as much as 2-4 dB
- over MPEG Intra pictures. The problem becomes more complex when
- considering the coding of Intra Macroblocks in mixed pictures, such as
- P or B, since the extend of a subband must, in the simplest of
- schemes, be limited to the dimensions of a macroblock.
-
-
- Prediction error coding
- One of the strongest gripes against MPEG is the use of the DCT for
- decorrelation of prediction error blocks. One explanation is that the
- DCT is suited for the statistical correlation of intra signals, but
- less suited for the statistics of prediction error (Non-Intra) signals.
- One common proposal is to replace the DCT with a Vector Quantizer.
- Prediction error (Non-intra) blocks typically contain far fewer bits
- than intra blocks. (The bits that comprise a Non-intra blocks can be
- thought of as having been previously distributed over previous blocks
- in previous pictures in the form of coefficients and side
- information...)
-
- Finer coding unit granularityÆs:
- The size of the transform block could be made smaller, larger, or both
- (myriad of different sizes). Likewise, the size of the motion
- compensation block can be made larger or smaller. The cost is more
- complex semantics (more decoder complexity) and the overhead bits to
- select the block size. Instead of sharing the same side information,
- the blocks within the macroblock could be assigned their own motion
- vectors, macroblock quantization scale factors, etc.
-
- Many advanced techniques were in investigated by MPEG during the
- formative stages of the specification, but were eventually eliminated
- for falling below a threshold set for coding gain vs. implementation
- complexity. Often, proposals presented a significant departure from the
- main stream algorithms under consideration. Each bit added to the
- syntax, or rule added to the semantics represents several gates to a
- silicon implementation, or from a software perspective, an extra table,
- if-then or case statement at multiple points in the decoding program.
-
-
-
- What are the similarities and differences between MPEG and
- H.263
-
- During its formative stages, H.263 was known as "H.26P" or "H.26X". It
- is an ITU-T standard for low-bitrate video and audio teleconferencing.
- It is designed to be more efficient (at least 2dB) than H.261 for bit
- rates below 64 kbits/sec (ISDN B channel). The primary target bit
- rate, approximately 27,000 bits/sec, is the payload rate of the V.34
- (a.k.a "V.Fast" or "V.Last") modem standard. In a typical scenario, 20
- kbit/sec would be allocated for the video portion, and 6.5 kbit/sec for
- the speech portion.
-
- Since the H.261 syntax was defined in 1990, techniques and
- implementation power have naturally improved. H.263 collects many of
- the advanced methods proposed during MPEGs formative stages into a
- syntax which shares a common basis more with MPEG-1 video than with
- H.261.
-
- The detailed differences and similarities are summarized below:
-
- Sample rate, precision, and color space:
- H.263 pictures are transmitted with QCIF dimensions. MPEG and JPEG
- allow nearly any picture size to be described in the headers. A fixed
- picture size promotes interoperability by forcing all implementors to
- operate at a common rate, rather than by allowing implementors to get
- away with whatever lowest sample rate the consumer can be tricked into
- buying. Another reason for a fixed sample rate is that, unlike MPEG
- which is generic, H.263 is geared towards a specific application
- (teleconferencing). Other MPEG applications such as CD Video and Cable
- TV define their own fixed parameters. Chromaticy is again YCbCr, 4:2:0
- macroblock structure, and 8 bits of uniform sample precision.
-
- [details deferred]
-
-
-
- How would you describe MPEG to the Data Compression
- expert?
-
- A. MPEG video is a block-based coding scheme.
-
-
- How does MPEG video really compare to TV, VHS, laserdisc ?
-
- A. VHS picture quality can be achieved for film source video at about 1
- million bits per second (with careful application of proprietary
- encoding methods). Objective comparison of MPEG to VHS is complex.
- The luminance response curve of VHS places -3 dB (50% response, the
- common definition of bandlimit) at around analog 2 MHz (digital
- equivalent to 200 samples/line). VHS chroma is considerably less dense
- in the horizontal direction than MPEG's 4:2:0 signal (compare 80
- samples/line equivalent to 176 !!). From a sampling density
- perspective, VHS is superior only in the vertical direction (480
- luminance lines compared to 240). When other analog factors are taken
- into account, such as interfield crosstalk and the TV monitor Kell
- factor, the perceptual vertical advantage becomes much less than 2:1.
- VHS is also prone to such inconveniences as timing errors (an annoyance
- addressed by time base correctors), whereas digital video is fully
- discretized. Duplication processes for pre-recorded VHS tapes at high
- speeds (5 to 15 times real time playback speed) introduces additional
- handicaps. In gist, MPEG-1 at its nominal parameters can match VHSs
- sexy low-pass-filtered look, but for critical sequences, is probably
- overall inferior to a well mastered, well duplicated VHS tape.
-
- With careful coding schemes, broadcast NTSC quality can be approximated
- at about 3 Mbit/sec, and PAL quality at about 4 Mbit/sec for film
- source video. Of course, sports sequences with complex spatial-
- temporal activity should be treated with higher bit rates, in the
- neighborhood of 5 and 6 Mbit/sec. Laserdisc is perhaps the most
- difficult medium to make comparisons with.
-
- First, the video signal encoded onto a laserdisc is composite, which
- lends the signal to the familiar set of artifacts (reduced color
- accuracy of YIQ, moirse patterns, crosstalk, etc). The medium's
- bandlimited signal is often defined by laserdisc player manufacturers
- and main stream publications as capable of rendering up to 425 TVL (or
- frequencies with Nyquist at 567 samples/line). An equivalent component
- digital representation would therefore have sampling dimensions of 567
- x 480 x 30 Hz. The carrier-to-noise ratio of a laserdisc video signal
- is typically better than 48 dB. Timing accuracy is excellent,
- certainly better than VHS. Yet some of the clean characteristics of
- laserdisc can be simulated with MPEG-1 signals as low as 1.15 Mbit/sec
- (SIF rates), especially for those areas of medium detail (low spatial
- activity) in the presence of uniform motion (affine motion vector
- fields). The appearance of laserdisc or Super VHS quality can therefore
- be obtained for many video sequences with low bit rates, but for the
- more general class of images sequences, a bit rate ranging from 3 to 6
- Mbit/sec is necessary.
-
-
- What are the typical coded sizes for the MPEG frames?
-
- Typical bit sizes for the three different picture types:
- Level
- I
- P
- B
- Average
-
- 30 Hz SIF
- @ 1.15 Mbit/sec
- 150,000
- 50,000
- 20,000
- 38,000
-
- 30 Hz CCIR 601
- @ 4 Mbit/sec
- 400,000
- 200,000
- 80,000
- 130,000
-
-
- Note: the above example is taken from a standard test sequence coded by
- the Test Model method, with an I frame distance of 15 (N = 15), and a P
- frame distance of 3 (M = 3).
-
- Of course, among differing source material, scene changes, and use of
- advanced encoder models these numbers can be significantly different.
-
- At what bitrates is MPEG-2 video optimal?
-
- The Test subgroup has defined a few example "Sweet spot" sampling
- dimensions and bit rates for MPEG-2:
-
- Dimensions
- Coded rate
- Application
-
- 352x480x24 Hz
- (progressive)
- 2 Mbit/sec
- Equivalent to VHS quality. Intended for film source video. Half
- horizontal 601(HHR). Looks almost broadcast NTSC quality
-
- 544x480x30 Hz
- (interlaced).
- 4 Mbit/sec
- PAL broadcast quality (nearly full capture of 5.4 MHz luminance
- signal). 544 samples matches the width of a 4:3 picture windowed
- within 720 sample/line 16:9 aspect ratio via pan&scan
-
- 704x480x30
- Hz.(interlaced)
- 6 Mbit/sec
- Full CCIR 601 sampling dimensions
-
-
- These numbers may be too ambitious. Bit rates of 3, 6, and 8 Mbit/sec
- respectively provide transparent quality for the above application
- examples when generated by a reasonably sophisticated encoder.
-
- Why does film perform so well with MPEG ?
-
-
- 1. The frame rate is 24 Hz (instead of 30 Hz) which is a savings of
- some 20%.
-
- 2. Film source video is inherently progressive. Hence no fussy
- interlaced spectral frequencies.
-