A

The SIP client issues an INVITE to the server, attempting to connect to a protected resource. The server rejects this initial request and issues a challenge to the client. In the case of a SIP server, this message is an HTTP WWW-Authenticate message, along with an HTTP 401 error message. The following shows some of the information contained in the challenge message The message contains several parameters, and included in this parameter list are the name of the realm and a nonce value. The nonce...

About the Authors

Scott Firestone holds a master's degree in computer science from MIT and has designed video conferencing and voice products since 1992, resulting in five patents. During his 10 years as a technical leader at Cisco, Scott developed architectures and solutions related to video conferencing, voice and video streaming, and voice-over-IP security. Thiya Ramalingam is an engineering manager for the Unified Communications organization at Cisco. Thiya holds a master's degree in computer engineering and...

About the Technical Reviewers

Herrera is a senior systems analyst for a Fortune 100 company in Houston, Texas. Mr. Herrera holds a bachelor of science degree in computer science from the University of Arizona and a master of science in telecommunication management from Southern Methodist University. His responsibilities have included design and implementation of enterprise network architectures, including capacity planning, performance monitoring, and network management services. His recent activities include...

Accessing the Focus

The central entity in the distributed architecture is called the focus. The focus maintains a signaling relationship with all the endpoints (or participants) in the conference. Conference and participant operations such as creating maintaining destroying conferences and adding deleting participants occur in the focus. Each conference must have a unique address of record (AoR) that corresponds to a focus. A conference server could contain multiple focus instances, and each focus may control a...

Acknowledgments

Nermeen Ismail provided a cover-to-cover review of the book, lending considerable expertise in video and voice over IP. Jesse Herrera also provided a full review, verifying all parts of the text in minute detail. The authors are particularly grateful to Stuart Taylor for providing a number of suggestions and comments on the introduction and architecture chapters to Tripti Agarwal for taking time to review the H.323 section and provide her insight on CallManager signaling implementation details...

Ad Hoc Conference Initiation Conference Button

The Conference button on the phone creates an ad hoc conference by expanding a two-party call into a multiparty conference. Consider the following call scenario 1. Bob places a call to Alice, and Alice answers. 2. Bob decides to include Fred in the call. Bob presses the Conference button to put Alice on hold. 3. Bob places a call to Fred, and Fred answers. Bob announces that he will include Fred in the preexisting conversation with Alice. 4. Bob presses the Conference button again to connect...

Ad Hoc Conference Initiation Meet Me Button

A Meet Me conference is one in which a number of destination telephone numbers are set aside for conferencing purposes. Each number corresponds to a unique conference that users can join on an ad hoc basis. Administrators set up these numbers by configuring the local phone system to forward these calls to a conference server. After the phone system redirects the calls, the conference server manages them independently. When these numbers are known, any caller can join them. Security consists of...

Ad Hoc Conferences

As previously stated, ad hoc conferences are the simplest form of meeting. Phone users create them in two ways When the meeting host presses the Conference button on the phone. The conference functionality enables a user to escalate an existing two-party call into one with multiple participants. By using the Meet Me option on the phone. Ad hoc meetings do not reserve resources in advance and do not require participants to interact with a voice user interface before joining the meeting.

Ad Hoc Video Conferencing

A video-enabled endpoint uses the same procedure to join a conference but offers additional parameters in the SDP offer to describe the properties of the video media stream. Example 5-4 shows an SDP offer, in which endpoint A sends an INVITE to the conference server. Example 5-4 SDP Offer from an Endpoint for Joining Ad Hoc Video Conference o san 1549546120 0 IN IP4 10.10.10.26 c IN IP4 10.10.10.26 m audio 49220 RTP AVP 0 8 m video 49222 RTP AVP 109 34 96 31 a rtpmap 109 H264 90000 a fmtp 109...

Address and Port Dependent Filtering

Figure 8-11 shows a NAT that implements address- and port-dependent filtering. Figure 8-11 Address- and Port-Dependent Filtering After the NAT creates the binding, it forwards a packet from the external network to the internal network if The source address port of the packet is Ae Pe The destination address port of the packet is Am Pm In this case, only the endpoint that received the packet can send a packet back to the internal network, and the packet must have a source port equal to the...

Annex N

This annex provides a reference picture selection mode. This mode provides two features The encoder can use a number of picture memories and select one of them as the reference picture in the encoding of the current frame. The amount of picture memory available at the decoder might be signaled via external means to help the memory management in the encoder. The decoder may use a back channel to send the encoder information on which parts of which pictures have been correctly decoded at the...

Annex W

This annex provides additional supplemental enhancement information. Annex W defines two values that were reserved in annex L Fixed-point inverse DCT (IDCT) indicates that a particular IDCT approximation is used to construct the bitstream. The annex specifies a particular reference IDCT implementation. Picture message indicates one or more octets representing message data. The annex specifies several message types Caption text. Note that this recommendation puts no restriction on how caption...

ARP Cache Poisoning

When a host attempts to send a packet to an IP address on the same subnet, the originating host must discover the Ethernet MAC address corresponding to the destination IP address. The originating host learns about this mapping by issuing an ARP request packet, which requests the MAC address used by the destination IP address. The destination machine receives this request and responds with an ARP reply that contains the MAC address. The originating host caches this IP-to-MAC address mapping into...

Asymmetric Encryption Public Key Cryptography

Unlike symmetric encryption, where both sender and receiver use the same key, public key encryption uses two keys. In this approach, each endpoint creates a public key and a private key. Each endpoint keeps the private key secret but makes the public key widely available. Public key cryptography can perform two major functions encryption and integrity protection. When used for encryption, public key cryptography relies on the fact that data encrypted with the public key can be decrypted only...

Audio Mixer

Within a conference, the audio mixer is responsible for selecting the input streams and summing these streams into a mixed output stream. This section provides a detailed view into the various modules that comprise it. The audio mixer is the core component in the media plane. It is responsible for selecting incoming audio streams, summing them, and distributing the summed output back to the participants. When mixing audio streams in a large conference, the audio mixer selects only a subset of...

Audio Receiver Path

The receiver requires the jitter buffer in the audio path because packets arriving at the receiver do not have uniform arrival times. The sending endpoint typically sends fixed-sized RTP packets onto the network at uniform intervals, generating a stream with a constant audio bit rate. However, jitter in the network due to transient delays causes nonuniform spacing between packet arrival times at the receiver. If the network imposes a temporary delay on a sequence of several packets, those...

C

CA (certificate authority), 303 calculating Call Proceeding messages (H.225), 190 CAs (certificate authorities), certificate enrollment process, 306-307 CDP (CRL Distribution Point), 307 centralized conferencing architecture, 37-38 centralized multipoint conferencing model, 157 certificate-based key distribution, 309 certificates, 302-304 endpoint authentication, 307 enrollment, 306-307 installing, 305-306 nonrepudiation, 309 reenrollment, 309 revoking, 307-309 CHC (Conversational High...

Call Hold Signaling with the Empty Capability

To indicate to the remote device that a hold operation is in progress, the endpoint initiating the hold operation sends a special form of the TCS, known as the ECS message, sometimes referred to as TCS 0. The ECS is a TCS with all capability fields set to null and support for it is a mandatory part of H.323 Version 2 and later. It does not disconnect the call, but simply informs the remote side that the sender does not currently have any decoding capability. As a result, the remote side closes...

Call Transfer with the Empty Capability

Call transfer using ECS requires that the phones involved use a common H.323 signaling agent. When a call is connected and the transfer button is pressed, the H.323 call signaling agent in the transferring phone sends ECS to the remote device, and media is closed. When the party to which the call was transferred answers, then the transfer button is pressed again and the H.323 call signaling agent sends a new TCS and negotiates media on behalf of the phone to which the call was transferred. For...

CAM Table Flooding

One Layer 2 exploit is a content-addressable memory (CAM) table flood, which allows an attacker to make a switch act like a hub. A hub forwards all packets to all ports. A switch learns about Ethernet MAC addresses at each of its ports so that it can forward packets only to the port that provides a link to the destination address of the packet. In a heavily switched environment, an attacker receives only packets destined for the attacker. By exploiting a CAM table flood, the attacker can cause...

Canonical RTP Model

Figure 7-12 shows the canonical RTP RTCP model for a video audio sender and receiver. Figure 7-12 Canonical RTP RTCP Model Figure 7-12 shows five different clocks. At the sender Clock A, used by the audio capture hardware to sample audio data Clock B, used by the video capture hardware to sample video data Clock C, the common timebase clock at the sender, used for the purposes of stream synchronization with RTCP packets Clock D, the clock used by the audio playout hardware to play audio data...

Centralized Architecture

In a centralized model, all the components of a conferencing system are implemented in a single server. Figure 2-6 shows an example of a centralized conferencing system with the necessary software modules. These software modules interact with each other through the interprocess communication methods provided by the operating system running in that server. Figure 2-6 Centralized Conferencing System with Software Modules Figure 2-6 Centralized Conferencing System with Software Modules The...

Codecs Bit Rates and Annexes Supported by Endpoints

Table A-23 identifies the annexes and codecs supported by different enterprise endpoints. Polycom View Station shows that it supports annexes F, I, and T at 64K and 128K bit rates. H.261, H.263, H.263-1998, H.264 VSX 3000 and VSX 7000 also support SIP signaling. Cisco soft clients Cisco Unified Personal Communicator (CUPC), Cisco Unified Video Advantage (CUVA) E-Conf Version 4 supports H.264 baseline profile.

Color Formats

The color and brightness information for pixels can be represented in one of several data formats. The two common formats are RGB and YCbCr. The RGB format represents each pixel using values for the red (R), green (G), and blue (B) additive color components. The YCbCr format represents each pixel using the brightness value (Y), along with color difference values (Cb and Cr), which together define the saturation and hue (color) of the pixel. The brightness values comprise the luminance channel,...

Command Syntax Conventions

The conventions used to present command syntax in this book are the same conventions used in the IOS Command Reference. The Command Reference describes these conventions as follows Boldface indicates commands and keywords that are entered literally as shown. In actual configuration examples and output (not general command syntax), boldface indicates commands that are manually input by the user (such as a show command). Italic indicates arguments for which you supply actual values. Vertical bars...

Common Reference Lip Sync

The goal of lip sync is to preserve the relationship between audio and video in the presence of fluctuating end-to-end delays in both the network and the endpoints themselves. Therefore, the most important restriction to keep in mind when discussing lip sync for video conferencing is the following Video conferencing systems cannot accurately measure or predict all delays in the end-to-end path for either the audio or video stream. This restriction leads to the most important corollary of lip...

Compensating for Network Issues The Jitter Buffer

Receivers must handle three potential anomalies in the input audio stream RTP packets arriving at a receiver may exhibit variability in arrival times (jitter), encountered during transmission over the network. Packets may arrive at the mixer in the incorrect order. RTP packets can be duplicated in the network, resulting in two or more of the same packet. However, for the mixer to operate properly, it must receive a stream of packets with uniform interpacket spacing, in the order they were...

Components of a Conferencing System

A conferencing system is composed of several components, including a user interface, a conference policy manager, media control, a player recorder, and other subsystems. This section explores these individual elements, providing details about the functionality found in each service and how together they make up a conferencing system. Figure 2-1 shows the major layers of a conferencing system User interface The user interface typically consists of several separate interfaces A scheduler to...

Conference Control

The conference control layer has three main functions Conference management and scheduling The conference scheduler works with the resource allocation module to reserve ports during the time window when meetings are scheduled to be active. The resource allocation module is aware of how the administrator has configured the system with respect to conferencing, floater, and overbook ports and uses this information when responding to resource allocation requests. At meeting time, after the user has...

Conference Policy Server

The conference policy server is the repository for the various policies stored in the system. There is only one instance of the conference policy server within the system. No standard protocol exists for communication between the focus and the policy server. Users join a conference by sending a SIP INVITE to the unique URI of the focus. If the conference policy allows it, the focus connects the participant to the conference. When a participant SIP endpoint wants to leave the conference, the...

Conference URI

A conference in a SIP framework is identified through a conference URI. The conference URI is the destination where all the SIP requests are sent and created managed by the conference server. An example of the conference URI is sip meetingplace cisco.com. Users can enter these URIs manually in their SIP client to dial into the conference system. Alternatively, the conference system embeds this in a web link and sends the link to the user through e-mail or instant messenger. If the user dials in...

Conferencing System Design and Architecture

This chapter examines various conferencing system architectures, their design, and the interactions of the modules that comprise the system. Details are provided about the user interface, conference control, and control and media planes from which conferencing systems are constructed. The later sections of this chapter discuss architectural models. In addition, specific conferencing system features and operational modes are reviewed in detail. Topics include the role of a conference moderator,...

Confidentiality Attacks

Without confidentiality, an attacker can listen to the audio and video streams between two endpoints. Hacker tools are available on the Internet for eavesdropping on voice packet data. One of these tools is called VOMIT (Voice Over Misconfigured IP Telephony). VOMIT processes a stream of captured voice packets and plays the audio. Solution Apply encryption to the media packets. Vendors of conferencing products are universally adopting the Advanced Encryption Standard (AES) to encrypt media...

Connection Hijacking

After two video conferencing endpoints establish a legitimate connection, an attacker might attempt to hijack the connection by impersonating one of the participants by issuing signaling commands to take over the conversation. The attacker might also use this type of spoofing to cause the connection to fail, in which case the attack is also considered a DoS attack. Solution Endpoints can thwart connection hijacking by authenticating the signaling messages. RTP Hijacking Whereas connection...

Continuous Presence Conferences

Continuous presence (CP) conferences have the benefit of displaying two or more participants simultaneously, not just the image of the loudest speaker. In this mode, the video MP tiles together streams from multiple participants into a single composite video image, as illustrated in Figure 1-2. CP conferences are also referred to as composition mode conferences or Hollywood Squares conferences. The video MP can either scale down the input streams before compositing or maintain the sizes of...

Correlating Timebases Using RTCP

The RTCP protocol specifies the use of RTCP packets to provide information that allows the sender to map the RTP domain of each stream into a common reference timebase on the sender, called the Network Time Protocol (NTP) time. NTP time is also referred to as wall clock time because it is the common timebase used for all media transmitted by a sending endpoint. NTP is just a clock measured in seconds. RTCP uses a separate wall clock because the sender may synchronize any combination of media...

Criteria for Determining Whether a Stream Should Be Mixed

The algorithm first determines the number of currently active streams. If the number is less than the maximum allowed (usually three to four), the algorithm includes the next available stream in the mixed stream. Any time the number of current speakers is less than the maximum, the mixer does not invoke the speaker selection algorithm, as long as the stream meets the earlier eligibility criteria. If the number of active streams exceeds the maximum, the algorithm must determine whether a new...

Delay Accumulation

Skew between audio and video might accumulate over time for either the video or audio path. Each stage of the video conferencing path injects delay, and these delays fall under three main categories Delays at the transmitter The capture, encoding, and packetization delay of the endpoint hardware devices Delays in the network The network delay, including gateways and transcoders Delays at the receiver The input buffer delay, the decoder delay, and the playout delay on the endpoint hardware...

Delays in the Network Path

A lip sync solution must work in the presence of many delays in the end-to-end path, both in the endpoints themselves and in the network. Figure 7-3 shows the sources of delay in the network between the sender and the receiver. The network-related elements consist of routers, switches, and the WAN. Figure 7-3 End-to-End Delays in a Video Conferencing System xCoder Figure 7-3 End-to-End Delays in a Video Conferencing System xCoder Router X experiences congestion at time T, resulting in a step...

Desktop Conferencing Systems

Low-end video conferencing products include desktop endpoints. When compared to high-end systems, the main difference is the maximum bit rate supported by the encoder in the sending direction. Other components in desktop endpoints include the following An inexpensive camera that generates more noise than a high-end model, which paradoxically results in a higher encoded video bit rate for the same quality. In addition, the fixed cameras do not allow remote control via far-end camera control...

Desktop Endpoint Attacks

Desktop video conferencing systems that run on PCs are vulnerable to operating system-based exploits As mentioned in the section Malware, a worm can execute a program on a vulnerable machine, causing a DoS attack. As mentioned in the section Denial of Service, an attacker can attempt to flood a PC with packets that consume resources. Solution A HIPS running on the PC can mitigate operating system vulnerabilities. Firmware Attacks Some appliance-based video conferencing endpoints run firmware...

Detecting Stream Loss

Conference server components must handle endpoint failures properly. Signaling protocols might provide some failure information, such as the SIP session-expires header. However, the media plane of the entire conferencing architecture must ensure that a backup mechanism detects and handles an endpoint failure in mid-session. The two common mechanisms to handle such scenarios are Internet Control Message Protocol (ICMP) unreachable messages and RTP inactivity timeout messages. If the application...

DHCP Exhaustion

DHCP exhaustion is a Layer 2 attack that also implements a DoS. An attacker sends a flood of DHCP request packets to the DHCP server, each requesting an IP address for a random MAC address. Eventually, the DHCP server runs out of available IP addresses and stops issuing DHCP bindings. This failure means that other hosts on the network cannot obtain a DHCP lease, which causes a DoS. Solution Cisco switches implement a feature called DHCP snooping, which places a rate limit on DHCP requests.

Early and Delayed Offer

Endpoints establish connections on the media plane by first negotiating media properties such as codec types, packetization periods, media IP address RTP port numbers, and so on. This information is transmitted with SIP messages using SDP. An endpoint may use two methods of exchanging SDP information Early offer In the early offer, the endpoint sends the media SDP in the initial INVITE and receives an answer from the conference server. Delayed offer In a delayed offer, the endpoint sends an...

Encoder

The encoding module compresses the mixed stream using the compression algorithm (for example, G.711uLaw, G.729, G.722, and so on) negotiated for this endpoint. After compression, the encoder performs the RTP packetization. The steps in RTP packetization include the following Setting the RTP payload type The encoder sets the payload type field based on the codec used for compressing the payload. The payload type indicates to the receiver how to decode the arriving packet. Setting the RTP time...

Endpoint Independent Filtering

Figure 8-9 shows a NAT that uses endpoint-independent filtering. Figure 8-9 Endpoint-Independent Filtering Figure 8-9 includes the following addresses that appear on the internal private network Ai Pi The source address port of packets from the internal endpoint Ae Pe The destination address port of packets from the internal endpoint Figure 8-9 also includes the following addresses that appear on the public network Am Pm The source address port of packets from the NAT to endpoints on the public...

Entropy Coding

Table A-18 shows the attributes of entropy coding in H.264. Table A-18 Entropy Coding for H.264 (Continued) The run and level are not coded jointly. H.264 codes the number of coefficients using a context-adaptive VLC table. H.264 codes the zero-run length sequence using a context-adaptive VLC. H.264 codes the coefficient levels using a fixed VLC table. H.264 codes trailing ones (+1 or -1) as a special case. Motion vectors are coded using a modified Exp-Golomb, nonadaptive VLC. Two zigzag...

Entry IVR

Play Welcome to xxx Enter Conference id In a distributed conferencing model, however, one central, logical conference server is composed of many individual servers. An endpoint might need to be moved from one physical server to another. In Figure 5-12, endpoint EP dials into the entry IVR associated with the conference server, enters the meeting ID, and goes through the name-recording process. Centralized logic then moves the endpoint to another entity in the conference server that hosts the...

Error Resiliency

If the network drops bitstream packets, decoders may have difficulty resuming the decoding process for several reasons Bitstream parameters may change incrementally from one MB to another. One example is the quantization level Most codecs allow the bitstream to change the quantization level by a delta amount between MBs. If the network drops a packet, the decoder will not have access to the previous incremental changes in the quantization level and will not be able to determine the current...

Escalation of Pointto PointtoMultipoint Call

In this scenario, a point-to-point call between two participants becomes a conference call with more than two parties. Participant A is in a point-to-point call with participant B and wants to invite a third participant, participant C. Participant A finds a conference server, sets up the conference, gets the URI or meeting ID, and transfers the point-to-point call to the conference server. Participant A then invites participant C into the conference call. Participant A can add participant C...

Evaluating Video Quality Bit Rate and Signalto Noise Ratio

When evaluating the efficiency of a video codec, there is one primary criterion the quality at a given bit rate. Most video conferencing endpoints negotiate a maximum channel bit rate before connecting a call, and the endpoints must limit the short-term one-way average bit rate to a level below this negotiated channel bit rate. A higher-efficiency codec can provide a higher-quality decoded video stream at the negotiated bit rate. Quality can be directly measured in two ways By visually...

Event Subscription and Notification

RFC 3265 extends the SIP specification, RFC 3261, to support a general mechanism allowing subscription to asynchronous events. Such events can include statistics, alarms, and so on. The two types of event subscriptions are in-dialog and out-of-dialog. A subscription that uses the Call-ID of an existing dialog is an in-dialog subscription, whereas the out-of-dialog subscription carries a Call-ID that is not part of the existing ongoing dialogs. Figure 5-6 shows an example of out-of-dialog...

Feedback Information

At Cisco Press, our goal is to create in-depth technical books of the highest quality and value. Each book is crafted with care and precision, undergoing rigorous development that involves the unique expertise of members from the professional technical community. Readers' feedback is a natural continuation of this process. If you have any comments regarding how we could improve the quality of this book, or otherwise alter it to better suit your needs, you can contact us through email at...

Forming RTCP Packets

Each RTP stream has an associated RTCP packet stream, and the sender transmits an RTCP packet once every few seconds, according to a formula given in RFC 3550. As a result, RTCP packets consume a small amount of bandwidth compared to the RTP media stream. For each RTP stream, the sender issues RTCP packets at regular intervals, and those packets contain a pair of time stamps an NTP time stamp, and the corresponding RTP time stamp associated with that RTP stream. This pair of time stamps...

Full Mesh Networks

Another option for decentralized conferencing is a full-mesh conference, shown in Figure 2-7. This architecture has no centralized audio mixer or MP. Instead, each endpoint contains an MP that performs media mixing, and all endpoints exchange media with all other endpoints in the conference, creating an N-by-N mesh. Endpoints with less-capable MPs provide less mixing functionality. Because each device sends its media to every other device, each one establishes a one-to-one media connection with...

Gatekeeper Signaling Options

There are two signaling modes in a gatekeeper-controlled H.323 network Gatekeeper routed call signaling (GKRCS) When the gatekeeper is configured for direct endpoint signaling, the calling and called endpoints exchange RAS admission control messages with the gatekeeper, but the H.225 and H.245 messages are exchanged directly between the calling and called endpoints, without gatekeeper involvement. Figure 6-12 shows the signaling path for direct endpoint signaling. Figure 6-12 Direct Endpoint...

General Port Based Attacks

Much like PC-based endpoints, servers require protection to thwart network port-based attacks such as malware and DoS attacks. Solution You can mitigate against port-based attacks as follows Use HIPS to detect attacks on the machine. Install a virus scanner on the server. Place a firewall in front of the server. In addition to typical firewall access control lists (ACLs), the administrator can configure the firewall to allow only call control traffic to the servers. Typically, UDP-oriented...

Goals and Methods

To provide an understanding of different video conferencing deployment models, including centralized and distributed architectures, by using real-world examples. To explain how video conferencing infrastructure uses signaling standards to establish synchronized, secure conference connections. The book uses call flow diagrams to show each signaling message needed to create a conference. To provide a comparison of the most widely used video codecs, in a concise reference format.

H

H.224, FECC applications, 17-18 H.225, 188 gatekeepers, 217 messages, 188-189 Alerting, 190 Call Proceeding, 190 Connect, 190 Notify, 191 Release Complete, 191 Setup, 189-190 Setup ACK, 190 H.232v4, H.235, 313 H.235.1, 314-316 H.235.2, 316-319 H.235.3, 319 H.235.6, 319-320 H.235, 313 H.235.1, 314-316 H.235.2, 316-319 H.235.3, 319 H.235.6, 319-320 H.235.1, 314-316 H.235.2, 316-319 H.235.3, 319 H.235.6, 319-320 H.245, 191-192 DTMF relay support indicators, 193-194 messages CLC ACK, 201 Close...

H225 Call Setup for Video Devices Using a Gatekeeper

The message sequence chart shown in Figure 6-16 illustrates two endpoints registering with a gatekeeper. The call flow shows endpoint A initiating a video call to endpoint B. In the diagram, both endpoints first register with the H.323 gatekeeper. After registration, Endpoint A initiates a call to Endpoint B using the gatekeeper direct endpoint signaling model. Figure 6-16 H.225 Connection Establishment with a Gatekeeper H.225 Video Call Establishment Via Gatekeeper Direct Endpoint Signaling...

H225 Call Signaling

The H.225 recommendation describes the protocol for H.323 session control, including call initiation and connection management. It fully describes how an H.323 call is initiated, established, and disconnected. H.225 is derived from the Q.931 ISDN signaling standard, after modification for packet networks. It is based on Abstract Syntax Notation 1 (ASN.1) encoding. This section reviews common H.225 message types and content. H.225 uses a reliable TCP connection between devices on the IP network....

H2352

H.235.2 is a protocol that uses certificates to provide authentication and integrity for H.323 signaling. In addition, H.235.2 can provide nonrepudiation. When used within a single administrative domain, a certificate-based PKI provides a much more scalable way of distributing credentials than using preshared keys. H.235.2 does not specify how certificates should be distributed or how endpoints should validate certificates. H.235.2 allows endpoints to create a digital signature for a packet by...

H245 Control Protocol

The H.245 recommendation provides the mechanism for the negotiation of media types and RTP channel establishment between endpoints. Using the H.245 control protocol, endpoints exchange details about the audio and video decoding capability each device supports. H.245 also describes how logical channels are opened so that media may be transmitted. Like H.225, H.245 messages are encoded using ASN.1 notation. The H.245 session information is conveyed to the calling device during the H.225 exchange....

H264 Error Resilience

Table A-19 shows that H.264 offers many types of data resiliency. Table A-19 Data Resiliency for H.264 The higher complexity and flexibility of the H.264 codec allows it to deliver superior performance relative to the other codecs. An article published by the IEEE in 2003, Rate-Constrained Coder Control and Comparison of Video Coding Standards, provides PSNR bit rate graphs for several test sequences using real-time encoding. The results show H.264, Baseline profile, as the clear leader The...

H323 Gateways

H.323 gateways allow interworking between devices on the IP network and devices on other network types, such as the PSTN. The gateway provides transparent signaling and media conversion between packet-and circuit-switched networks, allowing endpoints to communicate with remote devices without regard for the signaling methodology used by those devices. Figure 6-11 shows an H.323 gateway interconnecting the H.323 and PSTN networks. Figure 6-11 Interfacing Between the H.323 and PSTN Networks...

H323 Overview

H.323 is a widely deployed International Telecommunication Union (ITU) standard, originally established in 1996. It is part of the H.32x series of protocols and describes a mechanism for providing real-time multimedia communication (audio, video, and data) over an IP network. In this chapter, the intent is to familiarize you with some of the basic concepts involved in the H.323 architecture and signaling models, with an emphasis on voice and video conferencing. It does not attempt to cover all...

High Resolution Video Input

Endpoints that intend to use the full resolution available from a standard video camera must use video data from both fields of each frame and therefore must use a video codec that handles interlaced video. When you are using video from an NTSC camera, endpoints that have an interlace-capable codec can support resolutions up to 640x480 at 60 fields per second. NOTE Interlaced video can be de-interlaced using complex algorithms that attempt to expand each field into a full-resolution frame. The...

Hold and Resume

The user presses the Hold button on the phone to place the conference call on hold. The endpoint initiates a RE-INVITE and puts the audio stream in sendonly mode, as shown in Figure 5-15. EP in the Conference and Presses Hold Button EP in the Conference and Presses Hold Button In the following SDP offer answer exchange, note that the endpoint adds the attribute line a sendonly, causing audio to flow only from the EP to the conference server. The conference server responds with a recvonly. The...

How This Book Is Organized

Chapter 1 provides an overview of the conferencing models and introduces the basic concepts. Chapters 2 through 8 are the core chapters and can be read in any order. If you intend to read them all, the order in the book is an excellent sequence to use. The chapters cover the following topics Chapter 1, Overview of Conferencing Services This chapter reviews the elementary concepts of conferencing, describing the various types of conferences and the features found in each. It also provides an...

Human Perceptions

User-perceived objection to unsynchronized media streams varies with the amount of skew for instance, a misalignment of audio and video of less than 20 milliseconds (ms) is considered imperceptible. As the skew approaches 50 ms, some viewers will begin to notice the audio video mismatch but will be unable to determine whether video is leading or lagging audio. As the skew increases, viewers detect that video and audio are out of sync and can also determine whether video is leading or lagging...

Hybrid Coding

The previous discussion covered the coding steps taken for intraframes. As discussed in the section Encoder and Decoder Overview, intraframes are coded using information only from the current frame, and not from other frames in the video sequence. However, other than Motion-JPEG, codecs for video conferencing use a hybrid approach consisting of spatial coding techniques discussed previously, along with temporal compression that takes advantage of frame-to-frame correlation in the time domain....

Hybrid Decoder

When analyzing a hybrid codec, it is easier to start by analyzing the decoder rather than the encoder, because the encoder has a decoder embedded within it. Figure 3-17 shows the block diagram for the hybrid decoder. The encoder creates a bitstream for the decoder by starting with an original image, with frame number N, denoted by Fn o. Because this frame is the original input to the encoder, it is not shown in the decoder diagram of Figure 3-17. For this image, the output of the encoder...

Iii

The set of blocks on the left shows each possible coefficient location in the transform output array, and the set of blocks on the right shows the corresponding pixel pattern for each coefficient weighting value. In Figure 3-7, all the basis functions have been normalized so that the lowest-valued pixel in each basis function displays as black, and the highest-valued pixel in each basis function displays as white. The coefficients correspond to frequency patterns as follows Coefficients near...

Info

In this scenario, the encoder normally encodes every odd field to achieve 30 FPS. However, if the content of the video changes by a large amount as a result of excessive motion in the video stream, the encoder might fall behind for two reasons The CPU requirements of the encoder might increase, resulting in higher per-frame encoding latency, which might force the encoder to reduce the frame rate. The extra motion in the input video might cause the size of the encoded frames to temporarily...

Intra Prediction

H.264 has an intra prediction mode that predicts pixels in the spatial domain before the intra transform process. For luminance, the encoder can use two different modes a 16x16 prediction mode or a 4x4 prediction mode. For chrominance, the encoder can use an 8x8 prediction mode. In both cases, the pixels inside the block are predicted from previously decoded pixels adjacent to the block. The 16x16 prediction mode has four methods of prediction. Figure A-3 shows two modes. Figure A-3 Two of the...

Psec

IPsec operates by applying encryption at the IP layer, below the TCP and UDP stack. Because IPsec applies to the lowest layers of the IP stack, endpoints typically implement it as part of the operating system kernel, independently of the upper-layer application. Therefore, the applications are unaware of the underlying security, but the IPsec tunnel protects the UDP and TCP packets. However, administrators and users must manually configure IPsec on the originating and terminating endpoints and...

ISDN Gateway

In the early days of IP video conferencing, the only practical way to allow NAT FW traversal between enterprises was to circumvent the problem by using H.320 ISDN gateways to connect two endpoints over the public switched telephone network (PSTN). Figure 8-14 shows the topology for interenterprise H.323 connectivity, in which two endpoints connect over the PSTN WAN. Figure 8-14 Using ISDN to Circumvent the NAT FW Traversal Problem The major downside of this approach is the added delay of...

Joining a Scheduled or Reservationless Conference

At meeting time, each participant in a scheduled or reservationless conference typically dials the access number provided, which usually connects to an IVR system. The IVR prompts the participant to enter the meeting ID number and might ask the participant to speak your name at the tone for a recorded name announcement. When the IVR connects the participant to the conference, the IVR plays the recorded name for all participants to hear. Alternatively, each participant might enter a predefined...

Key Distribution

For two endpoints to use symmetric encryption for media or signaling, the endpoints must agree to use a common key for both encryption and decryption, a process called key distribution or key agreement. As mentioned previously, one method of performing key distribution is to distribute preshared keys out-of-band in a secure manner. However, this method of key distribution does not scale well. Two other methods of key distribution include certificate-based distribution and Diffie-Hellman key...

Layer 2 Attacks

Several attacks are possible at Layer 2, the Ethernet link layer. These attacks often require the attacker to have direct access to the internal network. Layer 2 attacks are extremely virulent because after an attacker compromises Layer 2, all layers above Layer 2 might not detect the attack. Solution Add security at Layer 2 within the network. A deployment that implements Layer 2 protection inside the network and Layer 3 firewall protections at the edge achieves layered security. An enterprise...

Lecture Mode and Round Robin Conferences

One presentation variant is called lecture mode. This mode uses a layout with a large subpicture showing the lecturer. Video streams of students occupy smaller subpictures. The lecturer subpicture is locked, and the student subpictures operate in continuous presence mode with voice-activated priority, so that a student asking a question becomes active in one of the smaller subpictures. The lecturer may receive a video stream with a different layout than the layout presented to students. The...

Lecture Mode Conferences

A lecture mode conference has a lecturer who presents a topic, and the rest of the participants can ask questions. There are two different styles of lecture mode meetings Open Open meetings allow participants to ask questions any time without requesting permission to speak. Controlled In a controlled meeting, the meeting administrator or lecturer must give a participant permission to ask questions or speak. If the administrator denies the request from an audience member to ask a question, the...

Lip Sync Policy

The receiver may decide not to attempt to achieve lip sync for synchronized audio and video streams in certain circumstances, even if lip sync is possible. There are two scenarios in which this situation might occur Excessive audio delay If the receiver must delay audio to establish lip sync, the receiver might instead choose to achieve the lower audio latency of unsynchronized streams. The reason is because lower end-to-end audio latency achieves the best real-time interaction. The receiver...

Lip Synchronization in Video Conferencing

Chapter 3, Fundamentals of Video Compression, went into detail about how audio and video streams are encoded and decoded in a video conferencing system. However, the last processing step in the end-to-end chain involves ensuring that the decoded audio and video streams play with perfect synchronization. This chapter focuses on audio and video however, video conferencing systems can synchronize any type of media to any other type of media, including sequences of still images or 3D animation. Two...

Low Resolution Video Input

If the video endpoint is configured to send low-resolution video, the endpoint typically starts with a full-resolution interlaced video sequence and then discards every other field. The resulting video has full resolution in the horizontal direction but half the resolution in the vertical direction, as shown in Table 7-2. Table 7-2 Video Formats Field Sizes Table 7-2 Video Formats Field Sizes When capturing from a typical interlaced camera and using only one of the fields, the encoder must...

M

Macroblocks, 101-102, 172 malleable playout devices, 244 malware, 262 mapping characteristics of NAT, 278-279 matrix quantization, 61 MC (multipoint controller), 10 MCTF (motion-compensated temporal filtering), 353 MCUs (multipoint control units), 9, 26, 209 MC, 10 service prefixes, 219-220 transrating, 12 media control support for ad hoc video conferencing, 172-173 media encryption MIKEY, 313 security-descriptions, 312 media multiplexing, 294 media plane, 22, 27 generation module, 32 speaker...

Macroblocks

For all codecs in this chapter, an MB consists of a 16x16 array of luminance values and two 8x8 arrays of chrominance values in 4 2 0 format, shown in Figure 3-35. Different codecs may further subdivide the MB in different ways. The H.261 and H.264 codecs show two ends of the spectrum For motion estimation, H.261 applies a motion vector to the entire 16x16 MB. In contrast, H.264 allows a MB to be subdivided in several ways. At the finest level of subdivision, H.264 can divide the 16x16 MB into...

Maninthe Middle Attacks

A MitM attack occurs when an attacker inserts a rogue device between two connected endpoints. The MitM can then listen to packets that flow between the endpoints and can modify packets in transit. The MitM is invisible to the two endpoints, which are unaware of the attack. One way for an attacker to become a MitM is to spoof the identity of each endpoint to the other. Figure 8-3 shows this scenario. Figure 8-3 A Man-in-the-Middle Attack Between Two Endpoints Figure 8-3 A Man-in-the-Middle...

Media Control and Transport

Endpoints and conferencing systems in an IP network send voice and video packets via Realtime Transport Protocol (RTP). RTP has a companion protocol called RTP Control Protocol (RTCP), which provides information about the RTP streams related to packet statistics, reception quality, network delays, and synchronization. This chapter addresses the following topics Basics of RTP and RTCP and their usage in conferencing systems Different RTP devices used in the conferencing architectures RTP...

Mid Call Bandwidth Requests

When a device needs to modify the session bandwidth during a call, it sends a bandwidth request message to the gatekeeper. For instance, an endpoint might need to request additional bandwidth when it adds video streams to an existing call. Endpoints adjust the bandwidth by sending a Bandwidth Request (BRQ) message to the gatekeeper with the new bandwidth requirement. If the bandwidth is available, the gatekeeper grants the request, signaled via the Bandwidth Confirm (BCF) message. If the...

Mikey

Another key exchange method is Multimedia Internet Keying (MIKEY). The base MIKEY specification is defined in RFC 3830, and the method that describes using it with SDP information is RFC 4567. Like s-descriptions, MIKEY inserts the key material as a parameter entry inside the SDP section of the SIP message. However, unlike s-descriptions, MIKEY encrypts this SDP entry. One of the benefits of MIKEY is that the SDP information, and therefore the SIP messaging, can transit in the clear, without an...

Motion Vectors

For the purpose of assigning MVs, each 16x16 MB may be segmented in several ways as a 16x16 block, as two 8x16 blocks, as two 16x8 blocks, or as four 8x8 blocks. The four 8x8 segmentation mode allows any of the 8x8 blocks to be further subdivided as two 4x8 blocks, two 8x4 blocks, or four 4x4 blocks, as shown in Figure A-2. Figure A-2 Segmentation of a Macroblock in H.264 As a result, an H.264 MB may contain 16 4x4 blocks, and in a B-frame, each block may have up to two MVs, for a total of 32...

Multiple Stream Support and Grouping of Media Lines

Advanced video endpoints may ask the conference server to send multiple video streams. The initial INVITE has one audio m-line (media line) and multiple video m-lines. Multiple video stream capability requires the ability to group the media lines so that the conference server knows which audio stream and video streams are tied together for lip-sync purposes. RFC 3388 defines some attributes (group) for the grouping. The syntax is as follows NOTE In this example, LS stands for lip...

Mute and Unmute

An endpoint can mute itself using one of two methods The endpoint can halt transmission of audio video media packets to the conference server. The endpoint can request that the conference server ignore packets from the endpoint. An endpoint can instruct a conference server to ignore audio or video media packets by sending proper DTMF tones. In Figure 5-16, the key sequence 5 notifies the conference server that the endpoint wants to be muted. In response, the conference server plays an...

Muting and Ejecting Participants

The muting and ejecting participants feature allows a conference administrator to mute the incoming voice stream from a participant or remove a participant from the conference. A participant might need to be muted when calling from an environment with much background noise or when the participant has placed the call on hold and music on hold is configured on the participant's phone. When a meeting agenda changes, it might be necessary to restrict the attendee list and remove certain...

N

NAT (Network Address Translation), 276-277 complications for VoIP protocols, 284-285 filtering characteristics, 279 address- and port-dependent filtering, 281 endpoint-dependent filtering, 281 endpoint-independent filtering, 279-281 mapping characteristics, 278-279 symmetric NAT, 282-283 NAT FW (NAT firewall traversal), 270 ICE, 298-299 solution requirements, 285-286 H.460 solution, 289 H.460.17 solution, 290-291 H.460.18 solution, 291-93 H.460.19 solution, 293-294 IP-IP gateway inside firewall...

NAT Classifications

A NAT is classified by two attributes Mapping characteristics How the NAT allocates a new external mapped address port for an internal private address port Filtering characteristics How the NAT determines whether to forward a packet from the public address space to the private address space after the NAT creates a binding For any of these mapping characteristics and filtering modes, the following sequence of events occurs to create a NAT binding 1. An internal endpoint with source address Ai...

NAT Complications for VoIP Protocols

NAT presents multiple problems for video conferencing and VoIP protocols, such as the following External endpoints cannot connect to an internal endpoint in the private address space until the internal endpoint creates a NAT binding by sending packets to the external endpoint. In other words, internal endpoints may not receive unsolicited connections. Of course, this restriction may be considered a security feature. However, one of the goals of NAT traversal is to allow authorized external...

NAT Mapping Characteristics

The mapping characteristic of a NAT describes how the NAT allocates external addresses Am Pm, based on the internal source address Ai Pi. The NAT may implement two main types of mapping Endpoint-independent mapping The internal endpoint may send packets with source address Ai Pi to multiple external endpoints, each with different addresses. Figure 8-7 shows a NAT that implements endpoint-independent mapping. In this case, the NAT uses the same external mapped address Am Pm for packets destined...

NATFW Traversal Solutions

NAT FW traversal refers to the capability of video conferencing endpoints to connect to each other across NATs and firewalls. Because firewalls often include NAT capability, the term firewall traversal also applies to NAT FW traversal. Solutions for firewall traversal should ideally satisfy several requirements One requirement of a firewall traversal solution is simplicity. If a traversal solution requires a special firewall configuration, the firewall configuration must be as simple as...