Source from http://www.erg.abdn.ac.uk/research/future-net/digital-video/mpeg2-trans.html.

MPEG-2 Transmission

The MPEG-2 standards define how to format the various component parts of a multimedia programme (which may consist of: MPEG-2 compressed video, compressed audio, control data and/or user data). It also defines how these components are combined into a single synchronous transmission bit stream. The process of combining the steams is known as multiplexing.

The multiplexed stream may be transmitted over a variety of links, standards / products are (or will soon be) available for :

Radio Frequency Links (UHF/VHF)
Digital Broadcast Satellite Links
Cable TV Networks
Standard Terrestrial Communication Links (PDH, SDH)
Microwave Line of Sight (LoS) Links (wireless)
Digital Subscriber Links (ADSL family)
Packet / Cell Links (ATM, IP, IPv6, Ethernet)

Many of these formats are being standardised by the DVB project.

Building the MPEG Bit Stream

To understand how the component parts of the bit stream are multiplexed, we need to first look at each component part. The most basic component is known as an Elementary Stream in MPEG. A programme (perhaps most easily thought of as a television programme, or a Digital Versatile Disk (DVD) track) contains a combination of elementary streams (typically one for video, one or more for audio, control data, subtitles, etc).

Elementary Stream (ES)

Each Elementary Stream (ES) output by an MPEG audio, video and (some) data encoders contain a single type of (usually compressed) signal. There are various forms of ES, including:

Digital Control Data
Digital Audio (sampled and compressed)
Digital Video (sampled and compressed)
Digital Data (synchronous, or asynchronous)

For video and audio, the data is organised into access units, each representing a fundamental unit of encoding. For example, in video, an access unit will usually be a complete encoded video frame.

Packetised Elementary Stream (PES)

Each ES is input to an MPEG-2 processor (e.g. a video compressor or data formatted) which accumulates the data into a stream of Packetised Elementary Stream (PES) packets. A PES packet may be a fixed (or variable) sized block, with up to 65536 bytes per block and includes a 6 byte protocol header. A PES is usually organised to contain an integral number of ES access units.

The PES header starts with a 3 byte start code, followed by a one byte stream ID and a 2 byte length field.

The following well-known stream IDs are defined in the MPEG standard:

110x xxxx - MPEG-2 audio stream number x xxxx.
1110 yyyy - MPEG-2 video stream number yyyy.
1111 0010 - MPEG-2 DSM-CC control packets.

The next field contain the PES Indicators. These provide additional information about the stream to assist the decoder at the receiver. The following indicators are defined:

PES_Scrambling_Control - Defines whether scrambling is used, and the chosen scrambling method.
PES_Priority - Indicates priority of the current PES packet.
data_alignment_indicator - Indicates if the payload starts with a video or audio start code.
copyright information - Indicates if the payload is copyright protected.
original_or_copy - Indicates if this is the original ES.

A one byte flags field completes the PES header. This defines the following optional fields, which if present, are inserted before the start of the PES payload.

Presentation Time Stamp (PTS) and possibly a Decode Time Stamp (DTS) - For audio / video streams these time stamps which may be used to synchronise a set of elementary streams and control the rate at which they are replayed by the receiver.
Elementary Stream Clock Reference (ESCR)
Elementary Stream rate - Rate at which the ES was encoded.
Trick Mode - indicates the video/audio is not the normal ES, e.g. after DSM-CC has signalled a replay.
Copyright Information - set to 1 to indicated a copyright ES.
CRC - this may be used to monitor errors in the previous PES packet
PES Extension Information - may be used to support MPEG-1 streams.

The PES packet payload includes the ES data. The information in the PES header is, in general, independent of the transmission method used.

MPEG-2 Multiplexing

The MPEG-2 standard allows two forms of multiplexing:

MPEG Program Stream A group of tightly coupled PES packets referenced to the same time base. Such streams are suited for transmission in a relatively error-free environment and enable easy software processing of the received data. This form of multiplexing is used for video playback and for some network applications.
MPEG Transport Stream Each PES packet is broken into fixed-sized transport packets forming a general purpose way of combining one or more streams, possibly with independent time bases. This is suited for transmission in which there may be potential packet loss or corruption by noise, or / and where there is a need to send more than one programme at a time.

Combining Elementary Streams from encoders into a Transport Stream (red) or a Programme Stream (yellow).The Service Information (SI) component on the transport stream is not shown.

The Programme Stream is widely used in digital video storage devices, and also where the video is reliably transmitted over a network (e.g. video-clip down load). Digital Video Broadcast (DVB) uses the MPEG-2 Transport Stream over a wide variety of under-lying networks. Since both the Program Stream and Transport Stream multiplex a set of PES inputs, interoperability between the two formats may be achieved at the PES level.

MPEG Transport Streams

A transport stream consists of a sequence of fixed sized transport packet of 188 B. Each packet comprises 184 B of payload and a 4 B header. One of the items in this 4 B header is the 13 bit Packet Identifier (PID) which plays a key role in the operation of the Transport Stream.

The format of the transport stream is described using the figure below (a later section describes the detailed format of the TS packet header). This figure shows two elementary streams sent in the same MPEG-2 transport multiplex. Each packet is associated with a PES through the setting of the PID value in the packet header (the values of 64 and 51 in the figure). The audio packets have been assigned PID 64, and the video packets PID 51 (these are arbitrary, but different values). As is usual, there are more video than audio packets, but you may also note that the two types of packets are not evenly spaced in time. The MPEG-TS is not a time division multiplex, packets with any PID may be inserted into the TS at any time by the TS multiplexor. If no packets are available at the multiplexor, it inserts null packets (denoted by a PID value of 0x1FFF) to retain the specified TS bit rate. The multiplexor also does not synchronise the two PESs, indeed the encoding and decoding delay for each PES may (and usually is different). A separate process is therefore require to synchronise the two streams (see below).

Single Program Transport Stream (Audio and Video PES).

Transmission of the MPEG-TS

Although the MPEG TS may be directly used over a wide variety of media (as in DVB), it may also be used over a communication network. It is designed to be robust with short frames, each one being protected by a strong error correction mechanism. It is constructed to match the characteristics of the generic radio or cable channel and expects an uncorrected Bit Error Rate (BER) of better than 10^-10. (The different variants of DVB each have their own outer coding and modulation methods designed for the particular environment.)

The MPEG-2 Transport Stream is so called, to signify that it is the input to the Transport Layer in the ISO Open System Interconnection (OSI) seven-layer network reference model. It is not, in itself, a transport layer protocol and no mechanism is provided to ensure the reliable delivery of the transported data. MPEG-2 relies on underlying layers for such services. MPEG-2 transport relies on underlying layers for such services. MPEG-2 transport requires the underlying layer to identify the transport packets, and to indicate in the transport packet header, when a transport packet has been erroneously transmitted.

When the MPEG-TS is used over a lower layer network protocol, the lower layer must identify the start of each transport packets, and indicate in the transport packet header, when a transport packet has been erroneously received. The MPEG TS packet size also corresponds to eight Asynchronous Transfer Mode (ATM) cells, assuming 8 B overhead (associated with the ATM Adaptation Layer (AAL)).

Single and Multiple Program Transport Streams

A TS may correspond to a single TV programme, or multimedia stream (e.g. with two a video PES and an audio PES). This type of TS is normally called a Single Programme Transport Stream (SPTS).

An SPTS contains all the information requires to reproduce the encoded TV channel or multimedia stream. It may contain only an audio and video PESs, but in practice there will be other types of PES as well. Each PES shares a common timebase. Although some equipments output and use SPTS, this is not the normal form transmitted over a DVB link.

In most cases one or more SPTS streams are combined to form a Multiple Programme Transport Stream (MPTS). This larger aggregate also contains all the control information (Program Specific Information (PSI)) required to co-ordinate the DVB system, and any other data which is to be sent.

Streams supported by the MPTS

Most transport streams consist of a number of related elementary streams (e.g. the video and audio of a TV programme). The decoding of the elementary streams may need to be co-ordinated (synchronised) to ensure that the audio playback is in synchronism with the corresponding video frames. Each stream may be tightly synchronised (usually necessary for digital TV programs, or for digital radio programs), or not synchronised (in the case of programs offering downloading of software or games, as an example). To help synchronisation time stamps may be (optionally) sent in the transport stream.

They are two types of time stamps:

The first type is usually called a reference time stamp. This time stamp is the indication of the current time. Reference time stamps are to be found in the PES syntax (ESCR), in the program syntax (SCR), and in the transport packet adaption Program Clock Reference (PCR) field.
The second type of time stamp is called Decoding Time Stamp (DTS) or Presentation Time Stamp (PTS ). These time stamps are inserted close to the material to which they refer (normally in the PES packet header). They indicate the exact moment where a video frame or an audio frame has to be decoded or presented to the user respectively. These rely on reference time stamps for operation.

Signalling Tables

For a user to receive a particular transport stream, the user must first determine the PID being used, and then filter packets which have a matching PID value. To help the user identify which PID corresponds to which programme, a special set of streams, known as Signalling Tables, are transmitted with a description of each program carried within the MPEG-2 Transport Stream. Signalling tables are sent separately to PES, and are not synchronised with the elementary streams (i.e they are an independent control channel).

DVB Signalling Tables and Transport Layer PIDs

The tables (called Program Specific Information (PSI) in MPEG-2) consist of a description of the elementary streams which need to be combined to build programmes, and a description of the programmes. Each PSI table is carried in a sequence of PSI Sections, which may be of variable length (but are usually small, c.f. PES packets). Each section is protected by a CRC (checksum) to verify the integrity of the table being carried. The length of a section allows a decoder to identify the next section in a packet. A PSI section may also be used for down-loading data to a remote site. Tables are sent periodically by including them in the transmitted transport multiplex.

MPEG-2 Signalling Tables

PAT - Program Association Table (lists the PIDs of tables describing each programme). The PAT is sent with the well-known PID value of 0x000. CAT - Conditional Access Table (defines type of scrambling used and PID values of transport streams which contain the conditional access management and entitlement information (EMM)). The PAT is sent with the well-known PID value of 0x001. PMT - Program Map Table (defines the set of PIDs associated with a programme, e.g. audio, video, ...) NIT - Network Information Table (PID=10, contains details of the bearer network used to transmit the MPEG multiplex, including the carrier frequency) DSM-CC - Digital Storage Media Command and Control (messages to the receivers)

Programme Service Information (SI) provided by MPEG-2 and used by DVB

To identify the required PID to de-multiplex a particular PES, the user searches for a description in a particular table, the Program Association Table (PAT). This lists all programmes in the multiplex. Each programme is associated with a set of PIDs (one for each PES) which correspond to a Programme Map Table (PMT) carried as a separate PSI section. There is one PMT per programme. DVB also adds a number of additional tables including those shown below.

DVB Signalling Tables

In addition to the PSI carried in each multiplex (MPTS), a service also carries information relating to the service as a whole. Since a service may use a number of MPTS to send all the required programs. Information is provided in the PSI tables defined by DVB. Each PSI table refers to the MPTS in which it is carried and any other MPTSs which carry other TS which are offered as a part of the same service.

BAT- Bouquet Association Table (groups services into logical groups) SDT- Service Description Table (describes the name and other details of services) TDT - Time and Date Table (PID=14, provides present time and date) RST - Running Status Table (PID=13, provides status of a programmed transmission, allows for automatic event switching) EIT - Event Information Table (PID=12, provides details of a programmed transmission)

Service Information (SI) provided by DVB

Most viewers have little knowledge of the operation of these tables and interact with the decoder through a graphical or textual programme guide.

Format of a Transport Stream Packet

Each MPEG-2 TS packet carries 184 B of payload data prefixed by a 4 B (32 bit) header.

The header has the following fields:

The header starts with a well-known Synchronisation Byte (8 bits). This has the bit pattern 0x47 (0100 0111).
A set of three flag bits are used to indicate how the payload should be processed.
1. The first flag indicates a transport error.
2. The second flag indicates the start of a payload (payload_unit_start_indicator)
3. The third flag indicates transport priority bit.
The flags are followed by a 13 bit Packet Identifier (PID). This is used to uniquely identify the stream to which the packet belongs (e.g. PES packets corresponding to an ES) generated by the multiplexer. The PID allows the receiver to differentiate the stream to which each received packet belongs. Some PID values are predefined and are used to indicate various streams of control information. A packet with an unknown PID, or one with a PID which is not required by the receiver, is silently discarded. The particular PID value of 0x1FFF is reserved to indicate that the packet is a null packet (and is to be ignored by the receiver).
The two scrambling control bits are used by conditional access procedures to encrypted the payload of some TS packets.
Two adaption field control bits which may take four values:
1. 01 – no adaptation field, payload only
2. 10 – adaptation field only, no payload
3. 11 – adaptation field followed by payload
4. 00 - RESERVED for future use
Finally there is a half byte Continuity Counter (4 bits)

Two options are possible for inserting PES data into the TS packet payload:

The simplest option, from both the encoder and receiver viewpoints, is to send only one PES (or a part of single PES) in a TS packet. This allows the TS packet header to indicate the start of the PES, but since a PES packet may have an arbitrary length, also requires the remainder of the TS packet to be padded, ensuring correct alignment of the next PES to the start of a TS packet. In MPEG-2 the padding value is the hexadecimal byte 0xFF.
In general a given PES packet spans several TS packets so that the majority of TS packets contain continuation data in their payloads. When a PES packet is starting, however, the payload_unit_start_indicator bit is set to 『1』 which means the first byte of the TS payload contains the first byte of the PES packet header. Only one PES packet can start in any single TS packet. The TS header also contains the PID so that the receiver can accept or reject PES packets at a high level without burdening the receiver with too much processing. This has an impact on short PES packets

MPEG PES mapping onto the MPEG-2 TS

Option Transport Packet Adaption Field

The presence of an adaptation field is indicated by the adaption field control bits in a transport stream packet. If present, the adaption field directly follows the 4 B packet header, before any user payload data. It may contain a variety of data used for timing and control.

One important item in most adaption packets is the Program Clock Reference (PCR) field.

Another important item is splice_countdown field. This field is used to indicate the end of a series of ES access units. It allows the MPEG-2 TS multiplexor to determine appropriate places in a stream were the video may be spliced to another video source without introducing undesirable disruption to the video replayed by the receiver. Since MPEG-2 video uses inter-frame coding a seamless switch-over between sources can only occur on an I-frame boundary (indicated by a splice count of 0). This feature may, for instance be used to insert a news flash in a scheduled TV transmission.

One other bit of interest here is the transport_private_data_flag which is set to 1 when the adaptation field contains private data bytes. Another is the transport_private_data_length field which specifies how many private data bytes will follow the field. Private data is not allowed to increase the adaptation field beyond the TS payload size of 184 bytes.

DVB Satellite

DVB transmission via satellite (often known as DVB-S), defines a series of options for sending MPEG-TS packets over satellite links. The DVB-S standard requires the 188 B (scrambled) transport packets to be protected by 16 bytes of Reed Solomon (RS) coding.

MPEG Transport Service Encoding Specified by DVB-S

The resultant bit stream is then interleaved and convolutional coding is applied. The level of coding may be selected by the service provider (from 1/2 to 7/8 depending on the intended application and available bandwidth). The digital bit stream is then modulated using Quadrature Phase Shift Keying (QPSK). A typical satellite channel has a 36 MHz bandwidth, which may support transmission at up to 35-40 Mbps (assuming delivery to a 0.5m receiving antenna).

Digital Storage Media Command and Control (DSM-CC)

DSM-CC is a toolkit for developing control channels associated with MPEG-1 and MPEG-2 streams. It uses a client/server model connected via an underlying network (carried via the MPEG-2 multiplex or independently if needed). DSM-CC may be used for controlling the video reception, providing features normally found on Video Cassette Recorders (VCR) (fast-forward, rewind, pause, etc). It may also be used for a wide variety of other purposes including packet data

See also :

Here today, Hell tomorrow

Thursday, February 21, 2008