Address Dependent Filtering

Figure 8-10 shows a NAT that implements address-dependent filtering. This type of NAT is also referred to simply as a restricted NAT. Figure 8-10 uses the same address port examples as Figure 8-9. Figure 8-10 Address-Dependent Filtering Figure 8-10 Address-Dependent Filtering The internal endpoint with source address Ai Pi sends a packet to an external endpoint with destination address Ae Pe. The NAT creates a public mapped address Am Pm. In addition, after the NAT creates this binding, the NAT...

SIP Port Usage

Firewall configuration for SIP is rather simple Port 5060 (UDP or TCP) carries the SIP signaling. The SIP signaling protocol negotiates the media ports for RTP and RTCP, which are UDP ports in the range of 1024 to 65,535. A firewall with a SIP ALG snoops the signaling and opens the media ports. However, a SIP ALG does not work with secure SIP. Secure SIP establishes an encrypted signaling channel using Transport Layer Security (TLS) over TCP port 5061. When two endpoints connect using encrypted...

The Symmetric NAT

A symmetric NAT implements a particular combination of mapping and filtering endpoint-dependent mapping, along with address- and port-dependent filtering. Figure 8-12 shows a symmetric NAT. Dynamic Address Mapping Ai Pi -> Am2 Pm2 Instead of allocating a static mapped address port for each unique internal endpoint, the NAT allocates a unique Am Pm for bindings created by packets with different external destination addresses, even when the packets come from the same internal endpoint. In the...

NAT and PAT

Firewalls at the edges of an enterprise often include functionality called Network Address Translation (NAT). One variant of NAT is Port Address Translation (PAT) however, both functions are often generically lumped together as NATP or simply NAT. The NAT functionality is often part of the firewall and is therefore sometimes referred to as a NAT FW. The NAT device translates the private IP addresses inside the enterprise into public IP addresses visible on the public Internet. Endpoints inside...

Depletion of Network Bandwidth

Depletion of network bandwidth attacks involve flooding the host network with enough data to clog the ingress egress points in the enterprise network. These attacks appear primarily as a flood of UDP packets. Often, these attacks are launched from a large number of external endpoints on the public Internet, in which case they are referred to as distributed denial-of-service (DDoS) attacks. Solution 1 When a flood attack overwhelms the bandwidth of the connection that links a service provider to...

H261 Compression Standard

The H.261 codec was developed by the ITU (International Telecommunications Union). H.261 is a legacy codec used for only two purposes H.323 requires that video endpoints support the H.261 format. H.261 provides interoperability with legacy endpoints. Table A-1 shows the video frame parameters for H.261. Supports frame positions at intervals corresponding to 29.97 Hz Technically, H.261 defines frames that may occur only at intervals corresponding to 29.97 Hz. However, it allows the encoder to...

SIP Responses

SIP responses are associated with a SIP request. Example 5-2 shows a typical response message. 172.27.14.4 5070 branch z9hG4bKhWn9PFlB2yaZbsvp36 From To Date Fri, 01 Mar 2002 00 15 28 GMT Call-ID 110219848364470172.27.14.4 Server Cisco-conferenceserver CSeq 1 INVITE Allow INVITE, OPTIONS, BYE, CANCEL, ACK, PRACK, UPDATE, REFER, SUBSCRIBE, NOTIFY, INFO, REGISTER, PUBLISH Contact Reason Q.850 cause 47 Content-Length 0 The first line of the response contains the protocol version (SIP 2.0) and the...

H264 Compression Standard

The H.264 codec was jointly developed by two standards bodies the ITU and the ISO IEC (International Organization for Standardization International Electrotechnical Commission). As a result, H.264 can be found in two different documents the ITU document H.264, and the ISO document MPEG-4, Part 10. H.264 is also known by its more generic name AVC, for Advanced Video Codec. H.264 has superior performance compared to previous standards such as H.263 or MPEG-4, Part 2. For the same perceptual...

M

An 8x8 DCT is more efficient when representing large, low-frequency areas, because it needs only a few values from the upper-left corner to represent a larger 8x8 area of slowly varying pixel values. However, the H.264 codec achieves good efficiency with a 4x4 transform. The transformation from spatial domain to frequency domain facilitates image compression in two ways Images encoded in the frequency domain can be encoded with fewer bits. The reason is because typical images consist of mainly...

Integer Transform

Unlike the 8x8 transform of most other codecs, H.264 initially defined a 4x4 integer-based transform. The transform provides almost as much frequency separation as the 8x8 DCT but has a simpler integer implementation. The FRExt subsequently added the option of an 8x8 integer-based transform. H.264 takes a two-stage approach when applying the 4x4 transform As shown in Figure A-6, when the MB is segmented into 16 4x4 blocks, and when the MB is entirely intracoded, the DC coefficients from each...

Using RTP for Buffer Level Management

Using only RTP packets without RTCP packets, receivers can establish buffer-level management. Receivers must establish an audio jitter buffer level that corresponds to the minimum level required to absorb network jitter to prevent a nonmalleable device from starving. Then, during the video conference, receivers must monitor the short-term average jitter buffer level to ensure that it is large enough to absorb arrival-time variations of the currently observed network jitter. In addition to...

H323 Endpoint Aliasing

When making calls between devices using H.323, a calling device can specify the called party using a number of schemes. H.323 provides several methods for addressing and identifying endpoints, including the following The E.164 Dialed Digits addressing scheme assigns a dialed digit string to each device and is one of the more familiar modes of endpoint aliasing. The dialed digit string is based on the ITU-T E.164 standard, which describes the numbering plan for international public...

Ad Hoc Audio Conferencing

Conferences are often referred to as either ad hoc or scheduled, based on the method by which they are invoked. Ad hoc conferences are created on-the-fly, without any prearranged scheduling. Scheduled conferences are booked in advance. The difference has to do with resource allocation The conference server has limited resources to perform video and audio mixing. If a conference is scheduled in advance, the conference server is guaranteed to be able to allocate the required audio and video...

Configuring Gatekeeper Support in a Cisco IOS Router

Example 6-1 illustrates a sample H.323 gatekeeper configuration in a Cisco IOS router. Example 6-1 Sample Cisco IOS Gatekeeper Configuration gatekeeper zone local GK1-SFRY cisco.com zone prefix GK1-SFRY 23 gw-type-prefix 1 * default-technology no shutdown In this simple example, the network has only one gatekeeper. The configuration also shows the following The zone local statement identifies the local zone name and defines the domain name for endpoints registering with an e-mail address. The...

Configuring a Gatekeeper in Cisco Unified Call Manager

Cisco Unified CallManager (CUCM) supports H.323 gatekeepers, which may be configured using the CUCM configuration web page, as shown in Figure 6-14. In addition, a separate H.225 gatekeeper-controlled trunk definition is required, as shown in Figure 6-15. Cisco Unified CallManager can also interwork with H.323 devices directly, without a gatekeeper. Any device that calls Cisco Unified CallManager resources directly (without a gatekeeper) must have its DNS name or IP address preconfigured in...

RTP Header

As stated in RFC 3550, the RTP header has a 12-octet mandatory part followed by an optional header extension. The header has the format illustrated in Figure 4-2. Synchronization Source (SSRC) Identifier Contributing Source (CSRC) Identifier(s) Payload Header (Optional Depending on the Codec) Payload The following sections describe the octets in the RTP header shown in Figure 4-2. The fields in this first octet of the RTP header are described as follows Version (V) 2 bits This field identifies...

Video Controls Far End Camera Control

Far-end camera control (FECC) enables a user to control the camera position of a remote endpoint and is a feature often found in high-end room systems. It typically requires a camera with a motorized pivot that can rotate with two degrees of freedom (up down and left right). Options for control include zoom, pan (left right rotation), and tilt (up down rotation). Video conferencing systems use one of two FECC protocols H.323 H.323 annex Q describes the standard FECC protocol for IP networks....

Encoder Overview

Video codecs may apply intracoding or intercoding. An intraframe, also called an I-frame, is a frame that is coded using only information from the current frame an intracoded frame does not depend on data from other frames. In contrast, an interframe may depend on information from other frames in the video sequence. Figure 3-5 shows an overview of intra-image encoding and decoding. The intra coding model consists of three main processes applied to each frame transform processing, quantization,...

Frame Rates Form Factors and Layouts

Two endpoints in a video conference negotiate a maximum video bit rate before connecting. Video codecs can generate bitstreams ranging from 64 kbps to 8 Mbps and more. Higher bit rates consume more network bandwidth but provide greater video quality and frame rate. A bit rate of 384 kbps is considered business quality for conferencing systems. However, as high-definition TV (HDTV) video conferencing becomes more prevalent, the definition of business quality might evolve to mandate HDTV...

Hybrid Encoder

Figure 3-18 shows the data flow of the corresponding hybrid encoder. When encoding frame FNO, the first step is performed by the motion estimation unit, which calculates the motion vectors that transform the previously reconstructed image Fn-i,r into the current predicted image Fn,p However, the motion estimation unit does not directly create the predicted image F p Instead, the motion estimation unit sends these motion vectors to the motion compensation unit, which applies the motion vectors...

Voice and Video Conferencing Fundamentals

Scott Firestone, Thiya Ramalingam, and Steve Fry Copyright 2007 Cisco Systems, Inc. Published by Cisco Press 800 East 96th Street Indianapolis, IN 46240 USA All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without written permission from the publisher, except for the inclusion of brief quotations in a review. Printed in the...

Contents

Chapter 1 Overview of Conferencing Services 3 Conference Types 3 Ad Hoc Conferences 4 Ad Hoc Conference Initiation Conference Button 4 Ad Hoc Conference Initiation Meet Me Button 5 Reservationless Conferences 5 Scheduled Conferences 6 Setting Up Scheduled Conferences 6 Joining a Scheduled or Reservationless Conference 8 Scheduled and Reservationless Conference Features 8 Voice and Video Conferencing Components 9 Video Conferencing Modes 11 Voice-Activated Conferences 11 Continuous Presence...

Motion Estimation

Figure 3-19 shows the motion estimation process on the encoder, which is by far the most CPUintensive step in the encoder decoder system. Figure 3-19 Motion Estimation Process for Motion Vector (Mx,My) Search Area Boundary for the 8x8 Block For each block in the original image FN, the encoder searches the reference frame FR in the same vicinity to find a reference block most highly correlated to the original. In this example, the block size is 8x8, and the encoder limits the motion vector to a...

Voice and Video Conferencing Components

A typical centralized video conferencing system requires a device that acts as the core entity to receive and redistribute streams. This device is known as a multipoint control unit (MCU). The MCU terminates all voice and video media streams in a conference and consists of two types of logical components A single multipoint controller, generally referred to as an MC or focus One or more multipoint processors, generally referred to as an MP or mixer The MP and MC might reside in separate servers...

RTP Translator

Translators have one input stream and one output stream and forward RTP packets with their SSRC identifier intact. If a translator does not change the sample rate of the stream, the translator can pass RTCP packets unchanged. If the translator alters the sample rate, however, the translator must send RTCP packets with new RTP NTP time stamp pairs. In a conferencing system, translators take different shapes. Examples are media termination point (MTP), transcoder, and transrater. Cisco...

RFC 2833 DTMF Detection and Generation

RFC 2833 is a standard that specifies a method of signaling Dual Tone Multiple Frequency (DTMF) digits using an RTP payload. The RFC 2833 DTMF detection and generation module is used by the audio mixer to detect incoming digits and to generate outgoing digits if directed to do so by media processing. The detector examines the incoming packet header payload type. If the payload type matches the negotiated RFC 2833 value, the packet is further interpreted to determine the DTMF digit it contains....

H225 Message Format

H.225 protocol data units follow the same format as Q.931 messages. Figure 6-2 illustrates the header used by H.225 messages. 8 7 6 5 4 3 2 1 Octet Offset Call Reference Value (1 or 2 Octets) The following list describes the H.225 message header Protocol Discriminator (one octet) The Protocol Discriminator identifies the Layer 3 protocol. For Q.931 messages, this value is always 8. It distinguishes user-network call control from other messages. Call Reference Value (one octet) This value...

Understanding Lip Sync Skew

Lip sync is the general term for audio video synchronization, and literally refers to the fact that visual lip movements of a speaker must match the sound of the spoken words. If the video and audio displayed at the receiving endpoint are not in sync, the misalignment between audio and video is referred to as skew. Without a mechanism to ensure lip sync, audio often plays ahead of video, because the latencies involved in processing and sending video frames are greater than the latencies for...

Frames

For some codecs, interframes are not restricted to contain only P-frames. Another type of interframe is the B-frame, which uses two frames for prediction. The B-frame references a frame that occurs in the past and a frame that occurs in the future. The term B-frame is short for between-frame. Figure 3-23 shows a sequence of I-, P-, and B-frames. Figure 3-23 Sequence of I-, P-, and B-Frames Figure 3-23 Sequence of I-, P-, and B-Frames The arrows in Figure 3-23 show the dependencies between...

Gatekeeper RAS Signaling

Gatekeepers communicate with endpoints, gateways, and MCUs using the RAS protocol. The following sections provide an overview of the basic concepts and messages used in RAS signaling but do not encompass the entire RAS message set. RAS signaling channels are the first to be opened between the gatekeeper and gatekeeper-managed devices and are separate from the call establishment and media channels. RAS signaling uses UDP port 1719 for H.225 messages and UDP port 1718 for multicast gatekeeper...

Depletion of Server Resources

DoS attacks do not always involve depleting the bandwidth on a link instead, DoS attacks can attempt to deplete resources inside a server or endpoint. In certain cases, servers allocate resources when they receive a packet from the network, and the attacker might seek to exhaust these resources by sending a flood of packets to the victim machine. The classic resource depletion attack is the SYN attack, which exploits the TCP protocol. In the TCP protocol, an endpoint requests a TCP connection...

DTMF Support

Endpoints that connect to a conference server via a PSTN gateway often must navigate through an IVR using DTMF tones, and therefore DTMF support in the endpoints and the conference server is important to the conferencing support. Endpoints can use three methods to send DTMF digits Voice-band DTMF tones are modulated as actual tones in the media. Endpoints that dial into a PSTN gateway must play DTMF tones in the media stream so that the PSTN gateway can hear the tones. Endpoints connecting via...

Bandwidth Information in the SDP

Bandwidth usage is specified with the attribute b < modifier> < bandwidth value> . Modifier should be application-specific (AS), conference type (CT), or transport-independent application-specific (TIAS), as defined in RFC 3890. The AS bandwidth includes the bandwidth that the RTP data traffic will consume, including the lower layers, down to the IP layer. Therefore, the bandwidth is in most cases calculated by considering the entire IP packet, which includes RTP payload, RTP header,...

IPIP Gateway Inside the Firewall

Figure 8-15 shows a solution for NAT FW traversal using an IP-IP media gateway. Figure 8-15 NAT FW Traversal with an IP-IP Gateway Inside the Firewall Figure 8-15 NAT FW Traversal with an IP-IP Gateway Inside the Firewall In this approach, all media streams coming from or going to internal endpoints flow through the gateway. In addition, this topology has two gatekeepers An internal gatekeeper to facilitate connections between internal endpoints A gatekeeper in the DMZ to allow external...

Escalation and Deescalation

Escalation is a process that allows a video-capable endpoint to join the conference in the audio- only mode and later establish a video stream. This process occurs in response to one of two scenarios End users begin a call in audio-only mode and then decide to add a video connection, either through inserting the camera or enabling video in their video phone. An end user turns on a video camera while in an audio-only call, causing the endpoint to automatically establish a video connection. An...

Common H225 Message Types Used in H323 Signaling

This section describes some of the protocol data units (PDU) used in initiating, establishing, and disconnecting H.323 calls. The PDUs are transmitted over the H.225 signaling channel, and each packet is sent as a whole message. The message is defined using a structure defined by a Transport Protocol Data Unit Packet (TPKT). A TPKT format is defined by IETF RFC 2006 and is used to delimit individual messages within the TCP stream. The TPKT contains a one-octet version ID, followed by a...

Configuring Basic Security

Figure 8-4 shows a general configuration for video conferencing security. This configuration involves layers of security, with protection both at the edges of the network and inside the network. Figure 8-4 Basic Configuration for Video Conferencing Security Figure 8-4 Basic Configuration for Video Conferencing Security Internal Video Conferencing Endpoints Internal Video Conferencing Endpoints This topology shows a three-legged firewall. The firewall has connections for the enterprise, the...

H263 Compression Standard

The H.263 codec was developed by the ITU. H.263 and went through three iterations. The first version of the standard was finalized in 1995 and added many enhancements relative to H.261. In the following tables, this version is referred to as Base H.263. The next two iterations of H.263 were issued in 1998 and 2000, with the following further enhancements H.263v2 (aka, H.263+ or H.263 1998) Sixteen annexes were added, up to annex T. In addition, the specification added supplemental enhancement...

H263

As described in Chapter 3, Fundamentals of Video Compression, the H.263 codec has three commonly used versions. The RTP payload format for each version differs slightly and is addressed in two different RFCs RFC 2190 defines the payload format for H.263-1996, and RFC 2429 defines the payload format for H.263-1998 and H.263-2000. Figure 4-18 shows the basic format of an H.263 packet. The following sections describe H.263-1996, H.263-1998, and H.263-2000 in more detail. You also learn about key...

H46017

Figure 8-16 illustrates NAT FW traversal with H.460.17. The DMZ contains a traversal server (TS) consisting of a modified gatekeeper. The DMZ GK operates only in GKRCS mode. Inside the enterprise, the diagram shows two types of endpoints those that support H.460.17 natively, and those that rely on a gateway proxy to incorporate the additional H.323 signaling required by the traversal protocol. The only firewall configuration necessary requires stateful bidirectional pinholes When a signaling...

Nal Packet Aggregation Type And

H.264 is a video codec that delivers visual quality superior to H.263 at the same bit rates. H.264 is also referred to as Advanced Video Coding (AVC). H.264 consists of two separate definitions The video coding layer (VCL) The network abstraction layer (NAL) The VCL represents the video content, and the NAL defines the packetization format for transport protocols such as RTP. All data is contained in NAL units. The H.264 bitstream can be of two formats NAL unit stream and byte stream format. We...

H323 Call Flow

Figure 8-5 shows the call flows for H.323. This diagram shows the original simple call flow specified in H.323v1. The basic H.323v1 case includes the following call flow 1. EP1 and the gatekeeper use the Registration, Admission, Status (RAS) protocol to pass highlevel connection commands. To discover a gatekeeper on the network, endpoints send the RAS Gatekeeper Request Message (GRQ) to UDP multicast address 224.0.1.41, on port 1718. In the process of defining the H.323 specification, the H.323...

Video SDP Extensions

The common video codecs used in video conferencing are H.261, H.263, and H.264. This section explains the syntax and semantics for describing parameters related to video codecs. Currently, no standard method exists to specify certain video-related parameters in the SDP offer answer. These parameters include the following Frame resolution (also called form factor) (Resolution means size, like 320x240.) Video endpoints and conference servers use the a fmtp attribute to carry codec-specific...

Media Control Support

Two primary video-specific media control operations need to be supported on video conferences Video fast update (VFU also called fast video update FVU ) An endpoint issues a VFU if its decoder requires an I-frame to continue decoding the video stream. When the encoder receives the VFU, it encodes the next frame as an I-frame. The decoder can request a full update or can ask the encoder to update only a part of the frame. The frame is divided into smaller parts, each called a group of blocks...