A

The SIP client issues an INVITE to the server, attempting to connect to a protected resource. The server rejects this initial request and issues a challenge to the client. In the case of a SIP server, this message is an HTTP WWW-Authenticate message, along with an HTTP 401 error message. The following shows some of the information contained in the challenge message The message contains several parameters, and included in this parameter list are the name of the realm and a nonce value. The nonce...

About the Authors

Scott Firestone holds a master's degree in computer science from MIT and has designed video conferencing and voice products since 1992, resulting in five patents. During his 10 years as a technical leader at Cisco, Scott developed architectures and solutions related to video conferencing, voice and video streaming, and voice-over-IP security. Thiya Ramalingam is an engineering manager for the Unified Communications organization at Cisco. Thiya holds a master's degree in computer engineering and...

About the Technical Reviewers

Herrera is a senior systems analyst for a Fortune 100 company in Houston, Texas. Mr. Herrera holds a bachelor of science degree in computer science from the University of Arizona and a master of science in telecommunication management from Southern Methodist University. His responsibilities have included design and implementation of enterprise network architectures, including capacity planning, performance monitoring, and network management services. His recent activities include...

Accessing the Focus

The central entity in the distributed architecture is called the focus. The focus maintains a signaling relationship with all the endpoints (or participants) in the conference. Conference and participant operations such as creating maintaining destroying conferences and adding deleting participants occur in the focus. Each conference must have a unique address of record (AoR) that corresponds to a focus. A conference server could contain multiple focus instances, and each focus may control a...

Acknowledgments

Nermeen Ismail provided a cover-to-cover review of the book, lending considerable expertise in video and voice over IP. Jesse Herrera also provided a full review, verifying all parts of the text in minute detail. The authors are particularly grateful to Stuart Taylor for providing a number of suggestions and comments on the introduction and architecture chapters to Tripti Agarwal for taking time to review the H.323 section and provide her insight on CallManager signaling implementation details...

Ad Hoc Conference Initiation Conference Button

The Conference button on the phone creates an ad hoc conference by expanding a two-party call into a multiparty conference. Consider the following call scenario 1. Bob places a call to Alice, and Alice answers. 2. Bob decides to include Fred in the call. Bob presses the Conference button to put Alice on hold. 3. Bob places a call to Fred, and Fred answers. Bob announces that he will include Fred in the preexisting conversation with Alice. 4. Bob presses the Conference button again to connect...

Ad Hoc Conference Initiation Meet Me Button

A Meet Me conference is one in which a number of destination telephone numbers are set aside for conferencing purposes. Each number corresponds to a unique conference that users can join on an ad hoc basis. Administrators set up these numbers by configuring the local phone system to forward these calls to a conference server. After the phone system redirects the calls, the conference server manages them independently. When these numbers are known, any caller can join them. Security consists of...

Ad Hoc Conferences

As previously stated, ad hoc conferences are the simplest form of meeting. Phone users create them in two ways When the meeting host presses the Conference button on the phone. The conference functionality enables a user to escalate an existing two-party call into one with multiple participants. By using the Meet Me option on the phone. Ad hoc meetings do not reserve resources in advance and do not require participants to interact with a voice user interface before joining the meeting.

Ad Hoc Video Conferencing

A video-enabled endpoint uses the same procedure to join a conference but offers additional parameters in the SDP offer to describe the properties of the video media stream. Example 5-4 shows an SDP offer, in which endpoint A sends an INVITE to the conference server. Example 5-4 SDP Offer from an Endpoint for Joining Ad Hoc Video Conference o san 1549546120 0 IN IP4 10.10.10.26 c IN IP4 10.10.10.26 m audio 49220 RTP AVP 0 8 m video 49222 RTP AVP 109 34 96 31 a rtpmap 109 H264 90000 a fmtp 109...

Address and Port Dependent Filtering

Figure 8-11 shows a NAT that implements address- and port-dependent filtering. Figure 8-11 Address- and Port-Dependent Filtering After the NAT creates the binding, it forwards a packet from the external network to the internal network if The source address port of the packet is Ae Pe The destination address port of the packet is Am Pm In this case, only the endpoint that received the packet can send a packet back to the internal network, and the packet must have a source port equal to the...

Annex C

Annex C provides facilities to support switched multipoint operation. The following facilities are Freeze picture request causes the decoder to freeze the displayed picture until a freeze picture release signal is received or a timeout period of at least 6 seconds has expired. This signal is transmitted either by external means such as H.245 or by using supplemental services (annex L). Fast update request causes the encoder to encode its next picture in intra mode. This signal is transmitted...

Annex L

This annex provides an opportunity for an encoder to send commands to the decoder. These command requests include the following Full picture freeze request. Partial picture freeze request. Resizing partial picture freeze request. Partial picture freeze release. Full picture snapshot tag. This indicates that the current picture is labeled for external use as a still image snapshot of the video content. This option is useful for conference recording. Partial picture snapshot tag. The same as the...

Annex N

This annex provides a reference picture selection mode. This mode provides two features The encoder can use a number of picture memories and select one of them as the reference picture in the encoding of the current frame. The amount of picture memory available at the decoder might be signaled via external means to help the memory management in the encoder. The decoder may use a back channel to send the encoder information on which parts of which pictures have been correctly decoded at the...

Annex P

This annex provides a reference picture resampling mode. This feature is a resampling process that can be applied to the previous decoded picture to generate a warped picture for use in predicting the current picture. This mode is used in specifying the relationship between the current picture and its reference if the source format differs. This mode may be used in restricted scenarios defined during capability negotiations. For example, encoders decoders might support only factor of 4 picture...

Annex Q

This annex provides a reduced resolution update mode. This mode is used for fast-moving video sequences. The encoder is allowed to send update information for a picture that is encoded at a reduced resolution while preserving the detail in a higher-resolution reference image. This creates a final image at the higher resolution. This capability allows the coder to increase the picture update rate while maintaining its subjective quality. The syntax of the bitstream when using this mode is...

Annex U

This annex provides an enhanced reference picture selection mode. Annex U provides benefits for both error resilience and coding efficiency by using a memory buffer of reference pictures. It allows the following Pictures to be predicted from multiple reference pictures at the MB level. This mode enhances the coding efficiency. Motion compensation to be extended to prediction from multiple pictures. Each MV is extended by a picture reference number that may index any of the multiple reference...

Annex W

This annex provides additional supplemental enhancement information. Annex W defines two values that were reserved in annex L Fixed-point inverse DCT (IDCT) indicates that a particular IDCT approximation is used to construct the bitstream. The annex specifies a particular reference IDCT implementation. Picture message indicates one or more octets representing message data. The annex specifies several message types Caption text. Note that this recommendation puts no restriction on how caption...

Annex X

Annex X defines profiles and levels for H.263. Of particular interest for video conferencing is section 2.6, which defines profile 5, also known as the Conversational High Compression (CHC) profile. This profile allows low-latency, real-time video encoding for video conferencing endpoints. This profile defines several features and limitations All the attributes of the H.263 Baseline profile, in addition to the following. Annex F, advanced prediction mode, which allows four MVs per MB, and the...

ARP Cache Poisoning

When a host attempts to send a packet to an IP address on the same subnet, the originating host must discover the Ethernet MAC address corresponding to the destination IP address. The originating host learns about this mapping by issuing an ARP request packet, which requests the MAC address used by the destination IP address. The destination machine receives this request and responds with an ARP reply that contains the MAC address. The originating host caches this IP-to-MAC address mapping into...

Asymmetric Encryption Public Key Cryptography

Unlike symmetric encryption, where both sender and receiver use the same key, public key encryption uses two keys. In this approach, each endpoint creates a public key and a private key. Each endpoint keeps the private key secret but makes the public key widely available. Public key cryptography can perform two major functions encryption and integrity protection. When used for encryption, public key cryptography relies on the fact that data encrypted with the public key can be decrypted only...

Audio Mixer

Within a conference, the audio mixer is responsible for selecting the input streams and summing these streams into a mixed output stream. This section provides a detailed view into the various modules that comprise it. The audio mixer is the core component in the media plane. It is responsible for selecting incoming audio streams, summing them, and distributing the summed output back to the participants. When mixing audio streams in a large conference, the audio mixer selects only a subset of...

Audio Receiver Path

The receiver requires the jitter buffer in the audio path because packets arriving at the receiver do not have uniform arrival times. The sending endpoint typically sends fixed-sized RTP packets onto the network at uniform intervals, generating a stream with a constant audio bit rate. However, jitter in the network due to transient delays causes nonuniform spacing between packet arrival times at the receiver. If the network imposes a temporary delay on a sequence of several packets, those...

C

CA (certificate authority), 303 calculating Call Proceeding messages (H.225), 190 CAs (certificate authorities), certificate enrollment process, 306-307 CDP (CRL Distribution Point), 307 centralized conferencing architecture, 37-38 centralized multipoint conferencing model, 157 certificate-based key distribution, 309 certificates, 302-304 endpoint authentication, 307 enrollment, 306-307 installing, 305-306 nonrepudiation, 309 reenrollment, 309 revoking, 307-309 CHC (Conversational High...

Call Hold Signaling with the Empty Capability

To indicate to the remote device that a hold operation is in progress, the endpoint initiating the hold operation sends a special form of the TCS, known as the ECS message, sometimes referred to as TCS 0. The ECS is a TCS with all capability fields set to null and support for it is a mandatory part of H.323 Version 2 and later. It does not disconnect the call, but simply informs the remote side that the sender does not currently have any decoding capability. As a result, the remote side closes...

Call Transfer with the Empty Capability

Call transfer using ECS requires that the phones involved use a common H.323 signaling agent. When a call is connected and the transfer button is pressed, the H.323 call signaling agent in the transferring phone sends ECS to the remote device, and media is closed. When the party to which the call was transferred answers, then the transfer button is pressed again and the H.323 call signaling agent sends a new TCS and negotiates media on behalf of the phone to which the call was transferred. For...

CAM Table Flooding

One Layer 2 exploit is a content-addressable memory (CAM) table flood, which allows an attacker to make a switch act like a hub. A hub forwards all packets to all ports. A switch learns about Ethernet MAC addresses at each of its ports so that it can forward packets only to the port that provides a link to the destination address of the packet. In a heavily switched environment, an attacker receives only packets destined for the attacker. By exploiting a CAM table flood, the attacker can cause...

Canonical RTP Model

Figure 7-12 shows the canonical RTP RTCP model for a video audio sender and receiver. Figure 7-12 Canonical RTP RTCP Model Figure 7-12 shows five different clocks. At the sender Clock A, used by the audio capture hardware to sample audio data Clock B, used by the video capture hardware to sample video data Clock C, the common timebase clock at the sender, used for the purposes of stream synchronization with RTCP packets Clock D, the clock used by the audio playout hardware to play audio data...

Centralized Architecture

In a centralized model, all the components of a conferencing system are implemented in a single server. Figure 2-6 shows an example of a centralized conferencing system with the necessary software modules. These software modules interact with each other through the interprocess communication methods provided by the operating system running in that server. Figure 2-6 Centralized Conferencing System with Software Modules Figure 2-6 Centralized Conferencing System with Software Modules The...

Codecs Bit Rates and Annexes Supported by Endpoints

Table A-23 identifies the annexes and codecs supported by different enterprise endpoints. Polycom View Station shows that it supports annexes F, I, and T at 64K and 128K bit rates. H.261, H.263, H.263-1998, H.264 VSX 3000 and VSX 7000 also support SIP signaling. Cisco soft clients Cisco Unified Personal Communicator (CUPC), Cisco Unified Video Advantage (CUVA) E-Conf Version 4 supports H.264 baseline profile.

Color Formats

The color and brightness information for pixels can be represented in one of several data formats. The two common formats are RGB and YCbCr. The RGB format represents each pixel using values for the red (R), green (G), and blue (B) additive color components. The YCbCr format represents each pixel using the brightness value (Y), along with color difference values (Cb and Cr), which together define the saturation and hue (color) of the pixel. The brightness values comprise the luminance channel,...

Command Syntax Conventions

The conventions used to present command syntax in this book are the same conventions used in the IOS Command Reference. The Command Reference describes these conventions as follows Boldface indicates commands and keywords that are entered literally as shown. In actual configuration examples and output (not general command syntax), boldface indicates commands that are manually input by the user (such as a show command). Italic indicates arguments for which you supply actual values. Vertical bars...

Common Reference Lip Sync

The goal of lip sync is to preserve the relationship between audio and video in the presence of fluctuating end-to-end delays in both the network and the endpoints themselves. Therefore, the most important restriction to keep in mind when discussing lip sync for video conferencing is the following Video conferencing systems cannot accurately measure or predict all delays in the end-to-end path for either the audio or video stream. This restriction leads to the most important corollary of lip...

Compensating for Network Issues The Jitter Buffer

Receivers must handle three potential anomalies in the input audio stream RTP packets arriving at a receiver may exhibit variability in arrival times (jitter), encountered during transmission over the network. Packets may arrive at the mixer in the incorrect order. RTP packets can be duplicated in the network, resulting in two or more of the same packet. However, for the mixer to operate properly, it must receive a stream of packets with uniform interpacket spacing, in the order they were...

Components of a Conferencing System

A conferencing system is composed of several components, including a user interface, a conference policy manager, media control, a player recorder, and other subsystems. This section explores these individual elements, providing details about the functionality found in each service and how together they make up a conferencing system. Figure 2-1 shows the major layers of a conferencing system User interface The user interface typically consists of several separate interfaces A scheduler to...

Conference Control

The conference control layer has three main functions Conference management and scheduling The conference scheduler works with the resource allocation module to reserve ports during the time window when meetings are scheduled to be active. The resource allocation module is aware of how the administrator has configured the system with respect to conferencing, floater, and overbook ports and uses this information when responding to resource allocation requests. At meeting time, after the user has...

Conference Policy Server

The conference policy server is the repository for the various policies stored in the system. There is only one instance of the conference policy server within the system. No standard protocol exists for communication between the focus and the policy server. Users join a conference by sending a SIP INVITE to the unique URI of the focus. If the conference policy allows it, the focus connects the participant to the conference. When a participant SIP endpoint wants to leave the conference, the...

Conference Types

The three main conferencing models are ad hoc, reservationless, and scheduled conferencing modes. Ad hoc conferencing is the most basic model and has the fewest features. It is also the easiest for the end user to create, because ad hoc conferences are simply created with the Conference button on the user's phone. Reservationless conferencing is the next most basic model and usually is created using the telephone keypad, after the user has called into the conference bridge. Both ad hoc and...

Conference URI

A conference in a SIP framework is identified through a conference URI. The conference URI is the destination where all the SIP requests are sent and created managed by the conference server. An example of the conference URI is sip meetingplace cisco.com. Users can enter these URIs manually in their SIP client to dial into the conference system. Alternatively, the conference system embeds this in a web link and sends the link to the user through e-mail or instant messenger. If the user dials in...

Conferencing Architectures

Conferencing architectures can be classified into two basic models centralized and distributed. A centralized architecture provides multiple services to video conferencing endpoints, but one single, standalone device provides each service. This approach is the most common architecture for audio and video conferencing systems. Centralized architecture provides single points for administration and management. Adding new functionality involves simply upgrading one device in the network. In a...

Conferencing System Design and Architecture

This chapter examines various conferencing system architectures, their design, and the interactions of the modules that comprise the system. Details are provided about the user interface, conference control, and control and media planes from which conferencing systems are constructed. The later sections of this chapter discuss architectural models. In addition, specific conferencing system features and operational modes are reviewed in detail. Topics include the role of a conference moderator,...

Confidentiality Attacks

Without confidentiality, an attacker can listen to the audio and video streams between two endpoints. Hacker tools are available on the Internet for eavesdropping on voice packet data. One of these tools is called VOMIT (Voice Over Misconfigured IP Telephony). VOMIT processes a stream of captured voice packets and plays the audio. Solution Apply encryption to the media packets. Vendors of conferencing products are universally adopting the Advanced Encryption Standard (AES) to encrypt media...

Connection Hijacking

After two video conferencing endpoints establish a legitimate connection, an attacker might attempt to hijack the connection by impersonating one of the participants by issuing signaling commands to take over the conversation. The attacker might also use this type of spoofing to cause the connection to fail, in which case the attack is also considered a DoS attack. Solution Endpoints can thwart connection hijacking by authenticating the signaling messages. RTP Hijacking Whereas connection...

Continuous Presence Conferences

Continuous presence (CP) conferences have the benefit of displaying two or more participants simultaneously, not just the image of the loudest speaker. In this mode, the video MP tiles together streams from multiple participants into a single composite video image, as illustrated in Figure 1-2. CP conferences are also referred to as composition mode conferences or Hollywood Squares conferences. The video MP can either scale down the input streams before compositing or maintain the sizes of...

Control Plane

The control plane of the conference server is responsible for establishing a signaling channel with each endpoint, negotiating the type of media, and connecting the endpoints with the mixers on the media plane. The control plane opens H.323 or SIP ports, listens on those ports, and waits for incoming connections. When an endpoint connects to the control plane, the control plane provides the endpoint with the audio and video session capabilities of the conference server as part of media...

Correlating Timebases Using RTCP

The RTCP protocol specifies the use of RTCP packets to provide information that allows the sender to map the RTP domain of each stream into a common reference timebase on the sender, called the Network Time Protocol (NTP) time. NTP time is also referred to as wall clock time because it is the common timebase used for all media transmitted by a sending endpoint. NTP is just a clock measured in seconds. RTCP uses a separate wall clock because the sender may synchronize any combination of media...

Criteria for Determining Whether a Stream Should Be Mixed

The algorithm first determines the number of currently active streams. If the number is less than the maximum allowed (usually three to four), the algorithm includes the next available stream in the mixed stream. Any time the number of current speakers is less than the maximum, the mixer does not invoke the speaker selection algorithm, as long as the stream meets the earlier eligibility criteria. If the number of active streams exceeds the maximum, the algorithm must determine whether a new...

D

Data dependency isolation, error resiliency, 90 data independence on H.263, 337 data integrity, 299 data prioritization, error resiliency, 90-91 data resiliency for H.261, 329-330 DC coefficient, 58 H.263 characteristics, 332 DCT coefficient prediction for MPEG-4, Part 2 codecs, 355 DCT scanning, 69-70 DDoS attacks, 259 deblocking filter for H.264, 352 declining lip sync as goal, 254 decode delay in receiver video path, 244 decoders on hybrid codecs, 72, 74 decoding order, 84 deep-packet...

Delay Accumulation

Skew between audio and video might accumulate over time for either the video or audio path. Each stage of the video conferencing path injects delay, and these delays fall under three main categories Delays at the transmitter The capture, encoding, and packetization delay of the endpoint hardware devices Delays in the network The network delay, including gateways and transcoders Delays at the receiver The input buffer delay, the decoder delay, and the playout delay on the endpoint hardware...

Delays in the Network Path

A lip sync solution must work in the presence of many delays in the end-to-end path, both in the endpoints themselves and in the network. Figure 7-3 shows the sources of delay in the network between the sender and the receiver. The network-related elements consist of routers, switches, and the WAN. Figure 7-3 End-to-End Delays in a Video Conferencing System xCoder Figure 7-3 End-to-End Delays in a Video Conferencing System xCoder Router X experiences congestion at time T, resulting in a step...

Desktop Conferencing Systems

Low-end video conferencing products include desktop endpoints. When compared to high-end systems, the main difference is the maximum bit rate supported by the encoder in the sending direction. Other components in desktop endpoints include the following An inexpensive camera that generates more noise than a high-end model, which paradoxically results in a higher encoded video bit rate for the same quality. In addition, the fixed cameras do not allow remote control via far-end camera control...

Desktop Endpoint Attacks

Desktop video conferencing systems that run on PCs are vulnerable to operating system-based exploits As mentioned in the section Malware, a worm can execute a program on a vulnerable machine, causing a DoS attack. As mentioned in the section Denial of Service, an attacker can attempt to flood a PC with packets that consume resources. Solution A HIPS running on the PC can mitigate operating system vulnerabilities. Firmware Attacks Some appliance-based video conferencing endpoints run firmware...

Detecting Stream Loss

Conference server components must handle endpoint failures properly. Signaling protocols might provide some failure information, such as the SIP session-expires header. However, the media plane of the entire conferencing architecture must ensure that a backup mechanism detects and handles an endpoint failure in mid-session. The two common mechanisms to handle such scenarios are Internet Control Message Protocol (ICMP) unreachable messages and RTP inactivity timeout messages. If the application...

DHCP Exhaustion

DHCP exhaustion is a Layer 2 attack that also implements a DoS. An attacker sends a flood of DHCP request packets to the DHCP server, each requesting an IP address for a random MAC address. Eventually, the DHCP server runs out of available IP addresses and stops issuing DHCP bindings. This failure means that other hosts on the network cannot obtain a DHCP lease, which causes a DoS. Solution Cisco switches implement a feature called DHCP snooping, which places a rate limit on DHCP requests.

Distributed Architecture

To scale a conferencing system to a large number of participants, the conferencing system must be decomposed into many different components, each on a separate hardware platform, which are geographically dispersed across the network. These components must establish signaling relationships to work together as a single system. The distributed system appears to the end user as a single device, but in fact, it is a network of devices, each providing a specific service. The Session Initiation...

E

E.164 Dialed Digits, 187 early offer, 158 ECS (Empty Capability Set) messages, 207 ejecting conference participants, 9 e-mail ID (H.323), 187 encoder module, 36-37 encoders, 55 audio transmission path delay sources, 233 on hybrid codecs, 76 transform processing, 55-57 adaptive encoding, 71-72 binary arithmetic coders, 68 coefficients, 58-59 DCT scanning, 69-70 entropy coding, 62-68 quantization, 59, 62 encoding delay, 241 encryption asymmetric encryption, 300 certificates, 302-304 digital...

Early and Delayed Offer

Endpoints establish connections on the media plane by first negotiating media properties such as codec types, packetization periods, media IP address RTP port numbers, and so on. This information is transmitted with SIP messages using SDP. An endpoint may use two methods of exchanging SDP information Early offer In the early offer, the endpoint sends the media SDP in the initial INVITE and receives an answer from the conference server. Delayed offer In a delayed offer, the endpoint sends an...

Encoder

The encoding module compresses the mixed stream using the compression algorithm (for example, G.711uLaw, G.729, G.722, and so on) negotiated for this endpoint. After compression, the encoder performs the RTP packetization. The steps in RTP packetization include the following Setting the RTP payload type The encoder sets the payload type field based on the codec used for compressing the payload. The payload type indicates to the receiver how to decode the arriving packet. Setting the RTP time...

Endpoint Independent Filtering

Figure 8-9 shows a NAT that uses endpoint-independent filtering. Figure 8-9 Endpoint-Independent Filtering Figure 8-9 includes the following addresses that appear on the internal private network Ai Pi The source address port of packets from the internal endpoint Ae Pe The destination address port of packets from the internal endpoint Figure 8-9 also includes the following addresses that appear on the public network Am Pm The source address port of packets from the NAT to endpoints on the public...

Entropy Coding

Table A-18 shows the attributes of entropy coding in H.264. Table A-18 Entropy Coding for H.264 (Continued) The run and level are not coded jointly. H.264 codes the number of coefficients using a context-adaptive VLC table. H.264 codes the zero-run length sequence using a context-adaptive VLC. H.264 codes the coefficient levels using a fixed VLC table. H.264 codes trailing ones (+1 or -1) as a special case. Motion vectors are coded using a modified Exp-Golomb, nonadaptive VLC. Two zigzag...

Entry IVR

Play Welcome to xxx Enter Conference id In a distributed conferencing model, however, one central, logical conference server is composed of many individual servers. An endpoint might need to be moved from one physical server to another. In Figure 5-12, endpoint EP dials into the entry IVR associated with the conference server, enters the meeting ID, and goes through the name-recording process. Centralized logic then moves the endpoint to another entity in the conference server that hosts the...

Error Resiliency

If the network drops bitstream packets, decoders may have difficulty resuming the decoding process for several reasons Bitstream parameters may change incrementally from one MB to another. One example is the quantization level Most codecs allow the bitstream to change the quantization level by a delta amount between MBs. If the network drops a packet, the decoder will not have access to the previous incremental changes in the quantization level and will not be able to determine the current...

Escalation of Pointto PointtoMultipoint Call

In this scenario, a point-to-point call between two participants becomes a conference call with more than two parties. Participant A is in a point-to-point call with participant B and wants to invite a third participant, participant C. Participant A finds a conference server, sets up the conference, gets the URI or meeting ID, and transfers the point-to-point call to the conference server. Participant A then invites participant C into the conference call. Participant A can add participant C...

Evaluating Video Quality Bit Rate and Signalto Noise Ratio

When evaluating the efficiency of a video codec, there is one primary criterion the quality at a given bit rate. Most video conferencing endpoints negotiate a maximum channel bit rate before connecting a call, and the endpoints must limit the short-term one-way average bit rate to a level below this negotiated channel bit rate. A higher-efficiency codec can provide a higher-quality decoded video stream at the negotiated bit rate. Quality can be directly measured in two ways By visually...

Event Subscription and Notification

RFC 3265 extends the SIP specification, RFC 3261, to support a general mechanism allowing subscription to asynchronous events. Such events can include statistics, alarms, and so on. The two types of event subscriptions are in-dialog and out-of-dialog. A subscription that uses the Call-ID of an existing dialog is an in-dialog subscription, whereas the out-of-dialog subscription carries a Call-ID that is not part of the existing ongoing dialogs. Figure 5-6 shows an example of out-of-dialog...

F

Fast Connect feature (H.323), 204-206 Fast Connect method (H.323v4), 273 FDCT (forward DCT), 56 features of reservationless conferences, 8-9 of scheduled conferences, 8-9 FECC (far-end camera control), 17-18 filtering characteristics of NAT, 279 address- and port-dependent filtering, 281 endpoint-dependent filtering, 281 endpoint-independent filtering, 279-281 firewalls 284-285 filtering characteristics, 279-281 mapping characteristics, 278-279 symmetric NAT, 282-283 PAT, 276-277 firmware...

Feedback Information

At Cisco Press, our goal is to create in-depth technical books of the highest quality and value. Each book is crafted with care and precision, undergoing rigorous development that involves the unique expertise of members from the professional technical community. Readers' feedback is a natural continuation of this process. If you have any comments regarding how we could improve the quality of this book, or otherwise alter it to better suit your needs, you can contact us through email at...

Floor Control

Floor control coordinates simultaneous access to the media resources in a conference. For instance, the meeting organizer or moderator can ensure that all participants hear only one participant. Or, the moderator can allow only certain participants to enter information into a shared document. End users can make floor control requests through a web interface or IVR. In addition, endpoints can provide access to floor control via floor control protocols. Floor control protocols allow the endpoints...

Foreword

I still remember the first video conferencing network I helped implement almost 20 years ago. It was an H.320-based system that used multiple ISDN channels to connect endpoints at the relatively high (for the time) speed of 768 kbps. However, building the video conferencing network was actually easier than using it. Users had to navigate through a complex array of parameters such as service provider IDs (SPID) and telephone IDs (TID) using a 30-button remote control just to set up the session....

Forming RTCP Packets

Each RTP stream has an associated RTCP packet stream, and the sender transmits an RTCP packet once every few seconds, according to a formula given in RFC 3550. As a result, RTCP packets consume a small amount of bandwidth compared to the RTP media stream. For each RTP stream, the sender issues RTCP packets at regular intervals, and those packets contain a pair of time stamps an NTP time stamp, and the corresponding RTP time stamp associated with that RTP stream. This pair of time stamps...

Full Mesh Networks

Another option for decentralized conferencing is a full-mesh conference, shown in Figure 2-7. This architecture has no centralized audio mixer or MP. Instead, each endpoint contains an MP that performs media mixing, and all endpoints exchange media with all other endpoints in the conference, creating an N-by-N mesh. Endpoints with less-capable MPs provide less mixing functionality. Because each device sends its media to every other device, each one establishes a one-to-one media connection with...

Gatekeeper Signaling Options

There are two signaling modes in a gatekeeper-controlled H.323 network Gatekeeper routed call signaling (GKRCS) When the gatekeeper is configured for direct endpoint signaling, the calling and called endpoints exchange RAS admission control messages with the gatekeeper, but the H.225 and H.245 messages are exchanged directly between the calling and called endpoints, without gatekeeper involvement. Figure 6-12 shows the signaling path for direct endpoint signaling. Figure 6-12 Direct Endpoint...

General Port Based Attacks

Much like PC-based endpoints, servers require protection to thwart network port-based attacks such as malware and DoS attacks. Solution You can mitigate against port-based attacks as follows Use HIPS to detect attacks on the machine. Install a virus scanner on the server. Place a firewall in front of the server. In addition to typical firewall access control lists (ACLs), the administrator can configure the firewall to allow only call control traffic to the servers. Typically, UDP-oriented...

Goals and Methods

To provide an understanding of different video conferencing deployment models, including centralized and distributed architectures, by using real-world examples. To explain how video conferencing infrastructure uses signaling standards to establish synchronized, secure conference connections. The book uses call flow diagrams to show each signaling message needed to create a conference. To provide a comparison of the most widely used video codecs, in a concise reference format.

H

H.224, FECC applications, 17-18 H.225, 188 gatekeepers, 217 messages, 188-189 Alerting, 190 Call Proceeding, 190 Connect, 190 Notify, 191 Release Complete, 191 Setup, 189-190 Setup ACK, 190 H.232v4, H.235, 313 H.235.1, 314-316 H.235.2, 316-319 H.235.3, 319 H.235.6, 319-320 H.235, 313 H.235.1, 314-316 H.235.2, 316-319 H.235.3, 319 H.235.6, 319-320 H.235.1, 314-316 H.235.2, 316-319 H.235.3, 319 H.235.6, 319-320 H.245, 191-192 DTMF relay support indicators, 193-194 messages CLC ACK, 201 Close...

H225 Call Setup for Video Devices Using a Gatekeeper

The message sequence chart shown in Figure 6-16 illustrates two endpoints registering with a gatekeeper. The call flow shows endpoint A initiating a video call to endpoint B. In the diagram, both endpoints first register with the H.323 gatekeeper. After registration, Endpoint A initiates a call to Endpoint B using the gatekeeper direct endpoint signaling model. Figure 6-16 H.225 Connection Establishment with a Gatekeeper H.225 Video Call Establishment Via Gatekeeper Direct Endpoint Signaling...

H225 Call Signaling

The H.225 recommendation describes the protocol for H.323 session control, including call initiation and connection management. It fully describes how an H.323 call is initiated, established, and disconnected. H.225 is derived from the Q.931 ISDN signaling standard, after modification for packet networks. It is based on Abstract Syntax Notation 1 (ASN.1) encoding. This section reviews common H.225 message types and content. H.225 uses a reliable TCP connection between devices on the IP network....

H2352

H.235.2 is a protocol that uses certificates to provide authentication and integrity for H.323 signaling. In addition, H.235.2 can provide nonrepudiation. When used within a single administrative domain, a certificate-based PKI provides a much more scalable way of distributing credentials than using preshared keys. H.235.2 does not specify how certificates should be distributed or how endpoints should validate certificates. H.235.2 allows endpoints to create a digital signature for a packet by...

H2353

H.235.3 is a hybrid security profile that combines the certificate method of H.235.2 with symmetric keys of H.235.1. This profile uses certificates to establish authentication for the initial connection, as defined in H.235.2. Endpoints then exchange Diffie-Hellman info and use the Diffie-Hellman secret as the key for generating HMAC authentication tags in subsequent messages, as defined in H.235.1. This scheme benefits from the scalability of certificate-based PKI to establish identity and...

H2356

Whereas most SIP endpoints use SRTP to encrypt media, most interoperable H.323 implementations use H.235.6 for media encryption. Like SRTP, H.235.6 uses a session key to encrypt the payload section of an RTP packet. However, unlike SRTP, H.235.6 does not authenticate the entire RTP packet. H.235.6 defines the voice encryption profile for H.235 to encrypt voice or video media. H.235.6 allows several encryption algorithms AES, RC2, DES, and Triple DES. However, the most secure of these is...

H245 Control Protocol

The H.245 recommendation provides the mechanism for the negotiation of media types and RTP channel establishment between endpoints. Using the H.245 control protocol, endpoints exchange details about the audio and video decoding capability each device supports. H.245 also describes how logical channels are opened so that media may be transmitted. Like H.225, H.245 messages are encoded using ASN.1 notation. The H.245 session information is conveyed to the calling device during the H.225 exchange....

H264 Error Resilience

Table A-19 shows that H.264 offers many types of data resiliency. Table A-19 Data Resiliency for H.264 The higher complexity and flexibility of the H.264 codec allows it to deliver superior performance relative to the other codecs. An article published by the IEEE in 2003, Rate-Constrained Coder Control and Comparison of Video Coding Standards, provides PSNR bit rate graphs for several test sequences using real-time encoding. The results show H.264, Baseline profile, as the clear leader The...

H323 Gatekeepers

A gatekeeper is an optional H.323 component on the network. When present, it provides important services for terminals, gateways, and MCUs under the control of a system administrator. These services include allowing endpoints to call one another using a dial plan and providing access and bandwidth control. The next section provides details about gatekeeper services. Endpoints, gateways, and MCUs can be configured to use the services of a gatekeeper. These devices use the RAS protocol for...

H323 Gateways

H.323 gateways allow interworking between devices on the IP network and devices on other network types, such as the PSTN. The gateway provides transparent signaling and media conversion between packet-and circuit-switched networks, allowing endpoints to communicate with remote devices without regard for the signaling methodology used by those devices. Figure 6-11 shows an H.323 gateway interconnecting the H.323 and PSTN networks. Figure 6-11 Interfacing Between the H.323 and PSTN Networks...

H323 Overview

H.323 is a widely deployed International Telecommunication Union (ITU) standard, originally established in 1996. It is part of the H.32x series of protocols and describes a mechanism for providing real-time multimedia communication (audio, video, and data) over an IP network. In this chapter, the intent is to familiarize you with some of the basic concepts involved in the H.323 architecture and signaling models, with an emphasis on voice and video conferencing. It does not attempt to cover all...

H323 Terminals

Terminals are end-user devices and may communicate with other terminals on the network, or with gateways when calling devices on other network types. Terminals include phones and phone systems running the H.323 protocol stack, desktop and room conferencing systems, and personal computers running an H.323 multimedia communications program such as Microsoft NetMeeting. Basic devices provide audio support and can optionally include video or data features, such as a whiteboard or application...

HDCapable Codecs

Several codecs support high-definition (HD) video. These codecs include Windows media 9, H.264 baseline profile (level 3.1 or level 4.0 and above), and H.264 Hi profile. H.264 baseline profile does not allow field (interlaced) encoding but does allow frame (progressive) encoding. HD endpoints can still support H.264 baseline profile to encode interlaced video, but this method requires the endpoint to combine the top and bottom field into a single merged frame. H.264 main and Hi profiles support...

High Resolution Video Input

Endpoints that intend to use the full resolution available from a standard video camera must use video data from both fields of each frame and therefore must use a video codec that handles interlaced video. When you are using video from an NTSC camera, endpoints that have an interlace-capable codec can support resolutions up to 640x480 at 60 fields per second. NOTE Interlaced video can be de-interlaced using complex algorithms that attempt to expand each field into a full-resolution frame. The...

Hold and Resume

The user presses the Hold button on the phone to place the conference call on hold. The endpoint initiates a RE-INVITE and puts the audio stream in sendonly mode, as shown in Figure 5-15. EP in the Conference and Presses Hold Button EP in the Conference and Presses Hold Button In the following SDP offer answer exchange, note that the endpoint adds the attribute line a sendonly, causing audio to flow only from the EP to the conference server. The conference server responds with a recvonly. The...

How This Book Is Organized

Chapter 1 provides an overview of the conferencing models and introduces the basic concepts. Chapters 2 through 8 are the core chapters and can be read in any order. If you intend to read them all, the order in the book is an excellent sequence to use. The chapters cover the following topics Chapter 1, Overview of Conferencing Services This chapter reviews the elementary concepts of conferencing, describing the various types of conferences and the features found in each. It also provides an...

Human Perceptions

User-perceived objection to unsynchronized media streams varies with the amount of skew for instance, a misalignment of audio and video of less than 20 milliseconds (ms) is considered imperceptible. As the skew approaches 50 ms, some viewers will begin to notice the audio video mismatch but will be unable to determine whether video is leading or lagging audio. As the skew increases, viewers detect that video and audio are out of sync and can also determine whether video is leading or lagging...

Hybrid Coding

The previous discussion covered the coding steps taken for intraframes. As discussed in the section Encoder and Decoder Overview, intraframes are coded using information only from the current frame, and not from other frames in the video sequence. However, other than Motion-JPEG, codecs for video conferencing use a hybrid approach consisting of spatial coding techniques discussed previously, along with temporal compression that takes advantage of frame-to-frame correlation in the time domain....

Hybrid Decoder

When analyzing a hybrid codec, it is easier to start by analyzing the decoder rather than the encoder, because the encoder has a decoder embedded within it. Figure 3-17 shows the block diagram for the hybrid decoder. The encoder creates a bitstream for the decoder by starting with an original image, with frame number N, denoted by Fn o. Because this frame is the original input to the encoder, it is not shown in the decoder diagram of Figure 3-17. For this image, the output of the encoder...

Iii

The set of blocks on the left shows each possible coefficient location in the transform output array, and the set of blocks on the right shows the corresponding pixel pattern for each coefficient weighting value. In Figure 3-7, all the basis functions have been normalized so that the lowest-valued pixel in each basis function displays as black, and the highest-valued pixel in each basis function displays as white. The coefficients correspond to frequency patterns as follows Coefficients near...

Info

In this scenario, the encoder normally encodes every odd field to achieve 30 FPS. However, if the content of the video changes by a large amount as a result of excessive motion in the video stream, the encoder might fall behind for two reasons The CPU requirements of the encoder might increase, resulting in higher per-frame encoding latency, which might force the encoder to reduce the frame rate. The extra motion in the input video might cause the size of the encoded frames to temporarily...

Intra Prediction

H.264 has an intra prediction mode that predicts pixels in the spatial domain before the intra transform process. For luminance, the encoder can use two different modes a 16x16 prediction mode or a 4x4 prediction mode. For chrominance, the encoder can use an 8x8 prediction mode. In both cases, the pixels inside the block are predicted from previously decoded pixels adjacent to the block. The 16x16 prediction mode has four methods of prediction. Figure A-3 shows two modes. Figure A-3 Two of the...

Introduction

In past years, video conferencing has been something of a novelty, and there has been a certain tolerance for quality problems. As audio and video conferencing move more into the mainstream, however, customers and end users will demand greater performance, reliability, security, and scalability from their systems. Voice and Video Conferencing Fundamentals provides readers with in-depth insight into the conferencing technologies and associated protocols. The information provided will enable...

Psec

IPsec operates by applying encryption at the IP layer, below the TCP and UDP stack. Because IPsec applies to the lowest layers of the IP stack, endpoints typically implement it as part of the operating system kernel, independently of the upper-layer application. Therefore, the applications are unaware of the underlying security, but the IPsec tunnel protects the UDP and TCP packets. However, administrators and users must manually configure IPsec on the originating and terminating endpoints and...

ISDN Gateway

In the early days of IP video conferencing, the only practical way to allow NAT FW traversal between enterprises was to circumvent the problem by using H.320 ISDN gateways to connect two endpoints over the public switched telephone network (PSTN). Figure 8-14 shows the topology for interenterprise H.323 connectivity, in which two endpoints connect over the PSTN WAN. Figure 8-14 Using ISDN to Circumvent the NAT FW Traversal Problem The major downside of this approach is the added delay of...

Joining a Scheduled or Reservationless Conference

At meeting time, each participant in a scheduled or reservationless conference typically dials the access number provided, which usually connects to an IVR system. The IVR prompts the participant to enter the meeting ID number and might ask the participant to speak your name at the tone for a recorded name announcement. When the IVR connects the participant to the conference, the IVR plays the recorded name for all participants to hear. Alternatively, each participant might enter a predefined...

Key Distribution

For two endpoints to use symmetric encryption for media or signaling, the endpoints must agree to use a common key for both encryption and decryption, a process called key distribution or key agreement. As mentioned previously, one method of performing key distribution is to distribute preshared keys out-of-band in a secure manner. However, this method of key distribution does not scale well. Two other methods of key distribution include certificate-based distribution and Diffie-Hellman key...