Failover and Redundancy

Failover in PIX allows a standby system to take over the functionality of the primary system as soon as it fails. This changeover can be set up to be stateful, meaning that the connection information stored on the failing PIX is transferred to the PIX taking over. Before getting into a discussion of how failover takes place on the PIX Firewall, it is useful to define some of the terminology that the PIX uses for the failover functionality.

Active Unit Versus Standby Unit

The active unit actively performs normal network functions. The standby unit only monitors, ready to take control should the active unit fail to perform.

Primary Unit Versus Secondary Unit

The failover cable has two ends, primary and secondary. The primary unit is determined by the unit that has the end of the failover cable marked "primary." The secondary unit is connected to the end of the failover cable marked "secondary."

System IP Address Versus Failover IP Address

The system IP address is the IP address of the primary unit (upon bootup). The failover IP address is the IP address of the secondary unit.

State information for stateful failover is communicated to the PIX taking over through an Ethernet cable joining the two PIXes. However, the actual detection of whether a unit has failed takes place via keepalives passed over a serial cable connected between the two PIXes (see note later on in the section on enhancements made in ver 6.2 of PIX). Both units in a failover pair communicate through the failover cable, which is a modified RS-232 serial link cable that transfers data at 117,760 baud (115 KB). The data provides the unit identification of primary or secondary, indicates the other unit's power status, and serves as a communication link for various failover communications between the two units. For the failover to work correctly, the hardware, software, and configurations on the two PIXes need to be identical. It is simple enough to use commands on the PIX to synch the configurations of the two PIXes, but the hardware and software must be matched up manually. The two units send special failover hello packets to each other over all network interfaces and the failover cable every 15 seconds. The failover feature in PIX Firewall monitors failover communication, the other unit's power status, and hello packets received at each interface. If two consecutive hello packets are not received within a time determined by the failover feature, failover starts testing the interfaces to determine which unit has failed, and it transfers active control to the standby unit. If a failure is due to a condition other than a loss of power on the other unit, the failover begins a series of tests to determine which unit failed. These tests are as follows:

• Link up/down test—This is a test of the network interface card. If an interface card is not plugged into an operational network, it is considered failed (for example, a switch failed due to a faulty port, or a cable is unplugged).

• Network activity test—This is a received network activity test. The unit counts all received packets for up to 5 seconds. If any packets are received during this interval, the interface is considered operational, and testing stops. If no traffic is received, the ARP test begins.

• ARP test—The ARP test consists of reading the unit's ARP cache for the ten most recently acquired entries. One at a time, the unit sends ARP requests to these machines, attempting to stimulate network traffic. After each request, the unit counts all received traffic for up to 5 seconds. If traffic is received, the interface is considered operational. If no traffic is received, an ARP request is sent to the next machine. If at the end of the list no traffic has been received, the ping test begins.

• Broadcast ping test—The ping test consists of sending out a broadcast ping request. The unit then counts all received packets for up to 5 seconds. If any packets are received during this interval, the interface is considered operational, and testing stops. If no traffic is received, the testing starts again with the ARP test.

The purpose of these tests is to generate network traffic to determine which (if either) unit has failed. At the start of each test, each unit clears its received packet count for its interfaces. At the conclusion of each test, each unit looks to see if it has received any traffic. If it has, the interface is considered operational. If one unit receives traffic for a test and the other unit does not, the unit that received no traffic is considered failed. If neither unit has received traffic, go to the next test.

The PIX taking over assumes the IP addresses as well as the MAC addresses of the failed PIX, allowing for transparency for the hosts attached to the interfaces of the failed PIX. The unit that activates assumes the IP and MAC addresses of the previously active unit and begins accepting traffic. The new standby unit assumes the failover IP and MAC addresses of the unit that was previously the active unit. Because network devices see no change in these addresses, no ARP entries change or time out anywhere on the network.

In general, the following events can often cause a PIX Firewall to fail over to a backup:

• Running out of memory. PIX looks for out-of-memory errors for 15 consecutive minutes before it fails over.

• Power outage on the primary PIX, or a reboot.

• An interface's going down for more than 30 seconds.

You can also force a failover manually by issuing the failover active command on the secondary PIX Firewall.

In case of a stateful failover, the following information is replicated from the primary PIX to the secondary PIX:

• The configuration

• TCP connection table, including timeout information for each connection

• Translation (xlate) table

• System up time; that is, the system clock is synchronized on both PIX Firewall units

The rules for the replication of the configuration are as follows:

• When the standby unit completes its initial bootup, the active unit replicates its entire configuration to the standby unit.

• The commands are sent via the failover cable.

• The write standby command can be used on the active unit to force the entire configuration to the standby unit.

• The configuration replication only replicates configurations from memory to memory. After replication, a write mem is needed to write the configuration into Flash memory.

Some of the significant information that is not replicated is as follows:

• User authentication (uauth) table

• ISAKMP and IPsec SA table

• Routing information

The secondary PIX must rebuild this information to perform the functions of the primary PIX, which has failed.

Version 6.2 of the PIX Firewall adds more functionality to the PIX failover feature set. From this version on, the failover communication can take place over the Ethernet cable used to copy state information from the primary PIX to the secondary PIX. This gets rid of the need to use a separate serial failover cable, thus overcoming distance limitations created by its use. A dedicated LAN interface and a dedicated switch/hub (or VLAN) are required to implement LAN-based failover. A crossover Ethernet cable cannot be used to connect the two PIX Firewalls. PIX LAN failover uses IP protocol 105 for communication between the two PIXes.

Was this article helpful?

0 0

Post a comment