LANs to WANs

Sunday, December 9, 2007

WAN Restoration Planning

Redundant Carrier Systems

The networks of the major carriers are built as redundant systems, meaning that there is a duplicate or backup system immediately available to overcome outages that may occur virtually anywhere on their networks. While the large carriers offer a high degree of redundancy in their networks, smaller competitors may not. Many CLECs, for example, may have fiber rings in the cities they serve, which move traffic between their regional or national data backbones. But not all the CLECs have dual-ring architectures that can route traffic in another direction if a cable cut occurs. Given the high cost of building resilient networks and the shortage of capital for infrastructure enhancement, telecom and IT managers must factor these considerations into their decision-making when selecting carrier services.

15.2.1 Switching Systems

For voice services and low-speed data, local exchange carriers (LECs) operate central offices, which use voice- and data-switching equipment from vendors such as Lucent Technologies, Nortel Networks, Fujitsu, Ericsson, and Siemens. Typically, the goal for these systems, including failure and both scheduled and unscheduled maintenance, is 99.999% (five 9s) availability. This works out to 5 minutes of downtime per year. An exception is Lucent’s 5ESS, which performs at 99.9999% (six 9s) availability. This equates to only 10 seconds of downtime per year. To achieve these high levels of performance, each switch is equipped with dual processors, so that if one processor fails, the second one can take over automatically. In essence, the switch can be viewed as two computers running simultaneously, with the backup ready to take over the full processing load instantly if a problem is detected.

The near 100% availability of these switches is also achieved with redundant subsystems and continuous internal testing. If an internal test reveals that one or more of a subsystem’s performance metrics fall below an established baseline, the backup subsystem takes over while the primary subsystem undergoes a full suite of diagnostic tests to pinpoint the problem. So even though the primary system is not in service, the availability of the switch is not diminished. The switches themselves are closely monitored by on-site technicians as well as remotely from one or more NOC.

In selecting the services of a CLEC, however, telecom and IT managers should be aware that these carriers’ switches may not be provisioned in the same way as those of the Regional Bell Operating Companies (RBOCs). Many CLECs did not purchase central office switches in the volumes that would qualify them for discounts. Others lacked the negotiating skills to obtain feature parity with the RBOCs. As a result, they do not always have the redundant subsystems and features to provide their equipment with the highest level of reliability. To make a bad situation worse, some CLECs leveraged the reputation for reliability of their vendor’s equipment, while not actually having a configuration that would provide that level of reliability.

15.2.2 Signal Transfer Points

The carriers also operate signal transfer points (STPs), which are the computers that route network inquiries into their signaling networks. These signaling networks are separate from the networks that carry the voice and data traffic of customers; they are packet-switched data networks that use messages to set up calls and support intelligent services. The STPs are configured as mated pairs with separate processors. The load-balanced STP pairs are not collocated but are usually hundreds of miles away from each other and operate at just under 50% capacity. With this architecture, if something happens to one STP, its mate can pick up the full load and operate until repair or replacement of the damaged STP can be made.

15.2.3 Network Control Points

NCPs are the customer databases for advanced services such as 800 number routing. The NCP nodes process 800-number call-routing requests received from telephone switches in the carrier’s network. They have dual processors, but if the second processor should fail, there is a backup NCP that is called into operation, thus protecting the customer’s intelligent services information. With several levels of redundancy, there is little chance that customer information regarding services and features will be lost. By way of comparison, AT&T alone has 310 NCPs in its network—50% more than its nearest competitor—enabling it to provide the highest level of redundancy to cope with virtually any disaster scenario.

15.2.4 Digital Interface Frames

Digital interface frames (DIFs) provide access to and from Class 4 central office switches for processing calls. The DIFs that handle this work have spare units available to take over immediately should a problem occur. Guiding the overall work of each DIF are two controllers running simultaneously, so that if one experiences a problem, the backup controller can take over without the customer noticing.

In addition, certain switched business services such as 800, which entail large-quantity egress (traffic flowing off the carrier’s network) can make use of an optional capability. This feature sends a customer’s traffic to another DIF at another switch location if the customer’s primary switch encounters a problem.

15.2.5 Power Systems

Carrier switching systems derive power from the local utility companies. Power lines come into the building to provide direct current (dc) to redundant rectifiers, which distribute power to the switching equipment and battery banks. If commercial power fails, batteries, which are kept charged by the rectifiers, provide backup power. The power levels of the batteries are monitored to ensure readiness in case a commercial power outage should occur. An additional stage of redundancy is provided by diesel-fueled generators, which can replace commercial power for days or weeks at a time.

15.2.6 Cable, Building, and Signaling Diversity

A carrier’s network facilities (i.e., cable routes) are built as a series of circles or loops that touch one another to form an interconnecting grid. Should any particular loop be cut, such as by a backhoe operator hitting a fiber cable, a fair amount of traffic can be sent over one or more adjacent fiber loops. Construction of new facilities in recent years has focused on making these loops smaller and smaller to reduce the magnitude of problems when they occur.

Generally, in larger metropolitan areas, carriers are able to offer business customers building diversity. By being able to reach the carrier’s network at two distinct geographic locations, business customers can enhance reliability for their high-capacity switched and/or special services applications.

This is how STPs are protected as well. Each pair of STPs is connected to every other pair of STPs by multiple data links. To ensure that connectivity will always be available, these links are established through three geographically separated routes. Should something happen on one route to disrupt signaling message transfer, the other two routes remain available to keep the carrier’s signaling system operational.

Within each central office switch, there is a device that permits the switch to interface with the carrier’s common channel signaling system to send and accept information that is used to set up and deliver calls. Should this interface device malfunction, the switch can use special data links that are directly connected to one or more “helper” switches to gain access to the signaling network via their interface. In this manner, central office switches can continue to process long-distance calls while a repair is made. AT&T calls this backup signaling capability the alternate signaling transport network (ASTN).

The central office switches can also make use of ASTN should both halves of a mated pair of STPs fail. Each switch normally uses a particular mated pair of STPs to handle call setup. If something happens to the STP pair on which the switch normally relies, the switch can use ASTN to access the signaling network through helper switches that use a different STP pair.

15.2.7 Real-Time Network Routing

Some of the larger carriers employ very sophisticated call-routing schemes. The network of switches belonging to AT&T, for example, routes calls through a system known as real-time network routing (RTNR). This software system enables every switch within the AT&T network to know the available resource capacity of every other switch in the network on a real-time basis. Since AT&T has more than 130 switches in its U.S. network, customer calls will have more than 130 ways to be routed across the network. This path diversity enables high call-completion rates despite regional congestion and resource constraints. Together, the redundancy and alternate routing features of AT&T and other public networks enable AT&T to offer customers special restoration services.

Carrier Restoration Services

For most customers, a 5-minute service interruption is within tolerable limits. But other customers, such as businesses that rely extensively on inbound or outbound calling, need a much shorter restoration period. For these customers, carriers offer optional services that can be used to meet individual reliability requirements. These can range from the carrier planning and building a complete private network, to customers using one or more optional reliability features to meet highly specific needs. Of the more than 1,200 carriers operating in the United States today, none has a more comprehensive set of restoration services than AT&T. The following offerings, however, can also be used to make comparisons with other carriers:

Split-access flexible egress routing (SAFER). For users of dedicated access facilities, SAFER provides a backup mechanism to protect the exit ramp of an organization’s toll-free service to ensure consistent and reliable access to its customers. SAFER protects against network congestion, access facility failure, or a service disruption at the AT&T switch. SAFER can also be used to protect a business from a failed or busy T1 facility. If toll-free calls cannot complete through the normal terminating network switch, SAFER redirects these calls through an alternate switch in the AT&T network. This gives callers an alternate route to the business location. This mechanism is automatically activated in near real-time, whenever it is needed.
Alternate destination call routing (ADCR). For customers with toll-free operations in more than one location, ADCR allows the AT&T switch that normally carries the calls to the company’s location to route incoming calls to another business location automatically when a problem arises. For example, if a company’s ACD at the main location is unavailable or too busy with calls, any additional calls would be forwarded automatically to an alternate location. Calls could be directed either through the original AT&T switch or through an alternate switch, thus protecting against disruptions in AT&T switches, local exchange switches, or customer equipment.
Network protection capability (NPC). For digital service customers, the optional NPC provides a geographically diverse backup facility and will usually switch traffic to this backup route within 20 ms of a service interruption. When the service is fully restored, the NPC automatically routes traffic according to the original configuration. The backup and restoration processes occur so rapidly that the customer will not notice any disruption to service. If data is in transit during the configuration change, no data will be lost.
Enhanced diversity routing option (EDRO). To protect a business from service disruptions in the event of a cable cut or natural disaster, EDRO provides customers with a documented physical and electrical circuit diversity program. As part of EDRO, AT&T designs and maintains physically separate paths through its network to eliminate common points of failure between circuits. Under this option, diverse circuits are separated by at least 100 feet and avoid common AT&T buildings to further reduce the possibility of a common point of failure.
Access protection capability (APC). To protect the access portion of a customer’s circuit, APC provides immediate recovery of access circuits from certain network failures by automatically transferring service to a dedicated, separately routed access circuit.
Customer controlled reconfiguration. This service is available in conjunction with AT&T’s digital access and cross-connect system (DACS). CCR offers a means to route around failed facilities. The DACS is not a switch (PBX) that can be used for setting up calls in real time or for performing alternate routing on a dynamic basis; it is simply a static routing device. Originally designed to automate the process of circuit provisioning to avoid having a carrier’s technician manually patch the customer’s derived 64-Kbps DS0 channels to designated long-haul transport facilities, the DACS allows CCR subscribers to organize and manage their own circuits from an on-premises management terminal. Any changes will take a few minutes to a half-hour to implement because changes must be uploaded to the carrier’s network before they take effect.
Bandwidth manager services. For data services, AT&T offers network managers the capability of fine-tuning their WAN to handle dynamic applications requirements, such as LAN interconnection, videoconferencing, and traffic-load balancing. In addition, bandwidth management services can be used to automatically restore dedicated private-line circuits or redirect private line and frame relay service circuits to a backup location in the event of a circuit failure and/or a disaster at the primary site.
T1.5 reserved service. This service supports applications requiring T1 (1.544 Mbps) speeds. AT&T brings a dedicated T1 facility on-line only after the customer verbally requests it with a phone call. This restoration solution requires that the customer pre-subscribe to the service and that local-access facilities already be in place.
Fiber Network Restoration

The major carriers in the United States operate extensive fiber networks and have implemented architectures with sophisticated protection mechanisms to ensure uninterrupted voice and data services. Many of the smaller carriers with fiber networks, on the other hand, ran out of investment capital while expanding the reach of their backbones and did not have time to build much redundancy into their networks. Companies with mission-critical applications, therefore, should exercise due diligence when considering such carriers.

AT&T is one carrier that provides extensive restoration services for its fiber network. The carrier’s Fast Automatic Restoration (FASTAR) system provides automated facilities restoration for all types of services (special services and switched) traveling over AT&T’s fiber-optic transmission systems. FASTAR is designed to restore 90% to 95% of network circuits within 2 minutes, while FASTAR II is capable of rerouting circuits within 60 ms of a failure.

Specifically, FASTAR is a routing algorithm used to instruct DACS systems to reroute traffic around failed or congested routes. In the event of a fiber-optic cable cut in the core network, FASTAR automatically locates the exact site of the cut and transfers the affected circuits to spare capacity going around the cut. In this way, 72 T3 circuits can be rerouted by FASTAR within 5 minutes. Before FASTAR was introduced in 1992, T3 circuits had to be rerouted manually at patch panels, which could take hours.

When a facilities problem occurs, such as a cable cut, the following activities are typically performed by the FASTAR system:
The problem is identified;
The exact location of the problem is determined;
The amount and location of currently available protection or backup/spare facilities is determined;
A substitute route is constructed from the available spare facilities;
The substitute route is tested to ensure it is operational and of high quality;
The traffic on the damaged route is moved to the substitute route.

The FASTAR system goes through all these facilities restoration steps outlined but at computer speed and on a fully automated basis.

FASTAR II operates in conjunction with a network consisting of more than 50 double-interconnected SONET rings with ATM switching at crossover points to create a ring and mesh architecture for data services. Using overlapping, self-healing rings, FASTAR II can restore certain types of network failures, such as simple cable cuts, in milliseconds. With this type of outage, clients often do not even notice that traffic was interrupted.

Many other carriers also use SONET for disaster recovery, including RBOCs and CLECs. Fiber is deployed in redundant rings around major metropolitan areas and high-traffic corridors between major cities. SONET fiber facilities are typically configured in a dual counter-rotating ring topology, as illustrated in Figure 15.1. This topology makes use of self-healing mechanisms in SONET-compliant equipment [i.e., add-drop multiplexers (ADMs)] to ensure the highest degree of network availability and reliability. In the event of a break in the line, traffic is automatically switched from one ring to the other, thus maintaining the integrity of the network. In the unlikely event that both the primary and secondary lines fail, the SONET-compliant equipment adjacent to the failures automatically loops the data between rings, thus forming a new C-shaped ring from the operational portions of the original two rings. When the break is fixed, the network automatically returns to its original state.

Figure 15.1: Self-healing SONET-compliant fiber-ring topology. In this scenario, if the inner ring is cut or fails, traffic is rerouted in the opposite direction on the outer ring. The SONET equipment at node D changes the direction of the traffic.

SONET’s embedded management channels give carriers and users alike more capabilities for continuous monitoring and preemptive corrective action to impending trouble conditions. In private SONET networks, network managers can reconfigure channels and facilities without the involvement of telephone companies. Through software programming, it is even be possible to map SONET circuits so that they can be automatically rerouted to alternate carrier facilities should a failure occur on the primary circuit(s).

A new generation of optical systems have become available that offer much more bandwidth than SONET. WDM uses the different colors in the light source as separate high-speed channels. Each channel that can support a particular service, such as T-carrier for private lines, Gigabit Ethernet for LAN interconnectivity, Fibre Channel for storage-area networking, or ESCON for IBM mainframe connectivity. WDM-equipped fiber links can also transport SONET payloads. The WDM systems carry SONET’s embedded overhead channels transparently, which perform link supervision and gather performance statistics and allow SONET’s fault-recovery procedures to operate as normal to ensure network availability. With a 50-ms recovery time, WDM-matches the recovery performance of SONET in case of link failures, allowing both technologies to play complementary roles. And with embedded supervisory channels, WDM systems can report on a number of performance metrics to help diagnose problems with individual channels, as well as with the fiber link.

LAN Restoration Planning

Network Reliability

A network or resource is reliable when it continues to operate despite the failure of a critical element. The critical elements are different for each network topology: star, ring, and bus. Thus, each topology can be evaluated in terms of its reliability, as well as its suitability for specific applications.

3.2.1 Star Topology

When it comes to link availability, the star topology is highly reliable. In the star topology, all network devices (i.e., nodes) or LAN segments connect to a central hub. Although the loss of a link prevents communication between the hub and the affected node, all other nodes will continue to operate as before unless the hub itself suffers a catastrophic failure.

To ensure a high degree of reliability, the hub has redundant subsystems at critical-points: the control logic, backplane, and power supply. The hub’s management system can enhance the fault tolerance of these redundant subsystems by continuously monitoring their operation and reporting any anomalies. With the power supply, for example, monitoring may include hotspot detection and fan operation to detect trouble before it disrupts hub operation. Upon the failure of the main power supply, the redundant unit switches over automatically or manually under the network manager’s control without disrupting the network.

The flexibility of the hub architecture lends itself to varying degrees of fault tolerance, depending on the criticality of the applications. For example, workstations running non-critical applications may share a link to the same LAN module at the hub. Although this configuration might seem economical, it is disadvantageous in that a failure in the LAN module will put all of the workstations on that link out of commission. A slightly higher degree of fault tolerance may be achieved by distributing the workstations among two LAN modules and links. That way, the failure of one module would affect only half the number of workstations. A one-to-one correspondence of workstations to modules offers an even greater level of fault tolerance, because the failure of one module impacts only the workstation connected to it. However, this configuration is also a more expensive solution than the others.

A critical application may demand the highest level of fault tolerance. This can be achieved by connecting the workstation to two LAN modules at the hub with separate links. The ultimate in fault tolerance would be achieved by connecting one of those links to a different hub. In this arrangement, a transceiver is used to split the links from the application’s host computer, enabling each link to connect with a different module in the hub or to a different hub. All of these levels of fault tolerance are summarized in

3.2.2 Ring Topology

In its pure form, the ring topology offers poor reliability to both node and link failures. The ring uses link segments to connect adjacent nodes together. Each node is actively involved in the transmissions of other nodes through token passing. The token is received by each node, at which time it can transmit data before passing the token to the adjacent node. The loss of a link not only results in the loss of a node but brings down the entire network as well. Enhancing the reliability of the ring topology requires adding redundant links between nodes as well as bypass circuitry. Adding such components, however, makes implementing the ring topology more expensive.

3.2.3 Bus Topology

The bus topology also provides poor reliability. If the link fails, that entire segment of the network is rendered useless. A redundant link for each segment will increase the reliability of the bus topology, but at extra cost. Unlike the ring topology, where each node is dependent on the others adjacent to it, the nodes in a bus topology are independent and contend for access to the LAN. If a node fails, the rest of the network continues to operate.

Network Availability

Availability is a measure of performance dealing with the LAN’s ability to support all users who wish to access it. A network that is highly available provides services immediately to users, whereas a network that suffers from low availability typically forces users to wait for access. The topology of the LAN influences availability.

Availability on the bus topology is dependent on load, the access control protocol-used, and length of the bus. With a light load, availability is virtually assured for any user who wishes to access the network. As the load increases, however, so does the chance of collisions. When a collision occurs, the transmitting nodes back off and try again after a short interval. The chance of collisions also increases with bus length.

A network based on a star topology can only support what the central hub can handle. In any case, each LAN module in the hub can handle only one request at a time, which can impact other users on that segment during heavy load conditions. Hubs equipped with multiple processors and LAN modules can alleviate this situation somewhat, but even with multiple processors, there will not usually be a one-to-one correspondence between users and processors. Such a system would be cost-prohibitive.

In terms of network availability, the ring topology scores higher than either the bus or star topology. This is because each node on the ring has an equal chance at accessing the network, which is governed by the token. However, since each node on the ring must wait for the token before transmitting data, the time interval allotted for transmission decreases as the number of nodes on the ring increases.

Recovery Options

The LAN is a data-intensive environment requiring special precautions to safeguard one of the organization’s most valuable assets—information. The procedural aspect of minimizing data loss entails the implementation of manual or automated methods for backing up all data on the LAN to avoid the tedious and costly process of recreating vast amounts of information. The equipment aspect of minimizing data loss entails the use of redundant circuitry, as well as components and subsystems that are activated automatically upon the failure of various LAN devices to prevent data loss and maintain network availability.

In addition to the ability to respond to errors in transmissions by detection and correction, other important aspects of LAN operation are recovery and reconfiguration. Recovery deals with bringing the LAN back to a stable condition after an error, and reconfiguration is the mechanism by which the network is restored to its previous condition after a failure.

LAN reconfigurations involve mechanisms to restore service upon loss of a link or network interface unit. To recover or reconfigure the network after failures or faults requires that the network possess mechanisms to detect that an error or fault has occurred and to determine how to minimize the effect on the system’s performance. Generally, these mechanisms provide the following:

Performance monitoring;
Fault location;
Network management;
System availability management;
Configuration management.

These mechanisms work in concert to detect and isolate faults, determine their effects on the system, and remedy these conditions to bring the network to a stable state with minimal impact on network availability.

Reconfiguration is a fault management scheme used to bypass major failures of network components. This process entails detecting that a fault condition has occurred that cannot be corrected by merely restarting the equipment. Once it is determined that a fault has occurred, its impact on the network is assessed so that an appropriate reconfiguration can be formulated and implemented. In this way, normal operations can continue under a new configuration until the problem can be fixed and the network restored to its primary configuration.

Fault detection is augmented by logging systems that keep track of failures over a period of time. This information is examined to determine trends that adversely affect network performance. This information, for example, might reveal that a particular component is continually causing problems on the network, or the monitoring system might detect that a component on the network has a higher-than-normal failure rate.

The configuration assessment component of the reconfiguration system uses information about the current system configuration—including connectivity, component placement, paths, and traffic flows—and maps it against the failed component. This information is analyzed to indicate how that particular failure is affecting the system and to isolate the cause of the failure. Once this assessment has been performed, a solution can be worked out and implemented.

The solution may consist of reconfiguring most of the operational processes to avoid the source of the fault. The solution determination component examines the configuration and the affected hardware or software components, determines how to move resources around to bring the network back to an operational state or indicates what must be eliminated because of the failure, and identifies network components that must be serviced.

Determining the most effective course of action is based on the criticality of keeping certain functions of the network operating and maintaining the resources available to do this. In some environments, nothing can be done to restore service because of device limitations (e.g., lack of redundant subsystems) or the lack of spare bandwidth. In such cases, about all that can be done is to indicate to the servicing agent what must be corrected and keep users informed of the situation.

Once an alternate configuration has been determined, the reconfiguration system-implements it. In most cases, this means rerouting transmissions, moving and restarting processes from failed devices, or reinitializing software that has failed because of some intermittent error condition. In some cases, nothing may need to be done except notify affected users that the failure is not severe enough to warrant system reconfiguration.

Geographically distributed LANs can be inter-networked over the WAN using such devices as bridges and routers connected to leased lines and/or switched services. An advantage of using routers for this purpose is that they permit the building of large mesh networks. With mesh networks, the routers can steer traffic around points of congestion or failure and balance the traffic load across the remaining links. In addition, routers have flow control and more comprehensive error protection than bridges.

Bridges are useful for reducing the size of sprawling LANs into discrete subnetworks that are easier to control and manage. Through the use of bridges, similar devices, protocols, and transmission media can be grouped together into communities of interest. Such partitioning can yield many advantages, such as eliminating congestion and improving the response time of the entire network. Subnetworks are also useful for testing new applications before making them available over the enterprise network.

LAN Administration

2.1 Introduction

The LAN administrator’s main focus is usually on keeping the network operating properly and making sure the needs of users are addressed in a timely manner, including hardware and software upgrades. To meet the needs of all users, the LAN administrator must have appropriate tools to accomplish a number of specific tasks. Many of these tasks can be automated to enable the LAN administrator to take care of multiple networks that may consist of hundreds of servers, desktop computers and peripherals—the configurations of which may change on a daily basis to meet the varying needs of mobile professionals, telecommuters, workgroups, departments, or the organization as a whole. Many of these tools may come bundled with the LAN vendor’s network management system. Some are bundled with help desk software. Others are available from third-party vendors as standalone products that can be launched from the network management system or help desk. All of these different management and administration systems and tools can even share data via application programming interfaces (APIs).

Whether bundled with other products or used separately, the right tools help the LAN administrator monitor, analyze, and adapt the LAN to changing organizational needs. The tools themselves are applications and utilities based on NetWare, Windows, or UNIX. In large heterogeneous environments, the LAN administrator will have occasion to use tools that work with multiple network operating systems. With the right tools, the LAN administrator can access multiple functions and client operating systems through a consistent graphical user interface, which can greatly improve personal performance.

Console and Agents

The key concepts in LAN administration are the console and agents. The console is the workstation that is set up to view information collected by the agents. The agents are special programs that are designed to retrieve specific information from the network. An application agent, for example, works on each workstation to log application usage. Workstation users are not aware of the agent and it has no effect on the performance of the workstation or the applications running on it. The collected information is organized into data sets and stored in a relational database, where it can be retrieved for viewing on the LAN administrator’s console.

Information from multiple sets of data can be displayed in several ways—cells, charts, text—and analyzed for such purposes as license management or inventory management, and printed as a detail or summary report. The entire process is illustrated in Figure 2.1. A comprehensive tool set allows the LAN administrator to perform the following main functions:

View and manipulate network data;
Automate file distribution;
Maintain hardware inventory;
Manage installed software, including application usage;
Receive notification of network events;
Establish and manage network printer support;
Automate network processes, such as backup and virus detection;
Monitor disk and file usage;
Create task lists;
Work with text files;
Establish and maintain security;
Manage storage.

Figure 2.1: Information flow between console and agent.

All of the agents that collect information in support of these functions are configured at the console using commands selected from the menu bar. Once configured, each type of agent can be assigned an icon that launches its associated viewer for displaying collected information.

With LANs increasingly being interconnected over wider geographical areas, network administrators can make use of agents to monitor WAN links as well. The agents play a role similar to the one monitors and protocol analyzers play in hardware. Although the agents collect the same information as the monitors, they also process the packets to provide detailed and high-level information regarding network traffic. In this way, they resemble protocol analyzers.

Hardware-based monitors and software-based agents can be used together distributed throughout a LAN, as well as geographically dispersed via the WAN. Their packet capture with filtering and decoding capabilities allows early detection of suspect traffic patterns and identification of faulty network devices. Since agents use the network only when information is requested from the network management system, they do not burden the network with unnecessary overhead.

2.2.1 Intelligent Agents

A critical tool in the IT department’s arsenal of management tools is the “ intelligent” agent, which is an autonomous and adaptive software program that accomplishes its tasks by executing commands remotely. System administrators, network managers, and software developers can create and use intelligent agents to execute critical processes, including performance monitoring, fault detection and restoration, hardware and software asset management, virus protection, and information search and retrieval. One of the latest applications of intelligent agents is intrusion detection, in which the agent reports security breaches at a router or firewall and takes appropriate steps to prevent further attacks. With the agent concept enjoying increasing acceptance, vendors are offering integrated development environments for creating agents, agent managers for deploying and managing agents across a network, and sample intelligent agents that are ready-to-run and can be customized for particular needs without requiring any programming skills.

What makes these agents so smart is the addition of programming code that tells them exactly what to do, how to do it, and when to do it. In essence, the intelligent agent plays the dual role of manager and agent. Under this scheme, polling is localized, events and alarms are collected and correlated, various tasks and trouble responses are automated, and only the most relevant information is forwarded to the central management station. In the process, network traffic is greatly reduced, as is the time for problem resolution.

2.2.2 Agent Behavior

The behavior of intelligent agents can be modified in two ways: templates and programs. The choice will depend on the level of an organization’s in-house network and systems management expertise.

Template Solutions

Some vendors, such as Hewlett-Packard, offer rules-based templates to modify the behavior of intelligent agents without the need for native-language programming. The role of the agent is defined in a template that tells the network management system what to do with the information collected by the agent. A network manager can bring up a representation of the template used for monitoring a particular application, for example, and edit the rules concerning responses to various alerts.

For instance, when a firewall issues an error message, under the rules described in its template, it sends all alerts to a particular system administrator. The network manager can change the rule so that an automated response is initiated instead, allowing agents to resolve problems and perform routine tasks (e.g., backups, batch jobs, file maintenance) locally. This prevents the system administrator from being overwhelmed by warning and informational messages, so he or she can focus only on potential service-disrupting conditions that cannot be resolved locally.

In a similar manner, responsibilities can be assigned to specific people. For instance, an operator can be assigned a particular group of Internet servers according to subsidiary company, department, or location. Likewise, responsibilities also can be assigned by type of application, such as electronic commerce implemented by various publicly accessible Web servers, or by the expertise of various site personnel [e.g., Webmaster, Common Gateway Interface (CGI) programmer, Java application developer, certified security engineer]. The advantage of templates is that they can alter the reporting behavior of agents without the need to rewrite the agents themselves.

Programmatic Solutions

For programmers, many vendors offer tool kits that accelerate the development of the agent and manager components, which is normally a significant and time-consuming activity. Without a tool kit, each agent must be hand-coded, that is built from scratch—a process that can take days or weeks. The use of tool kits can reduce development time to only minutes, allowing developers to spend more time on the value-added components of their application, such as processing data gathered by the agent or communicating with, and controlling, external devices.

The agent-creation process is further simplified because developers now can use an intuitive C++ interface that insulates them from the complexities of APIs. For example, without using a tool kit, a developer might have to write more than 200 lines of code to create a simple “get” request. With a tool kit, such agent development can take as few as four lines of code, with the rest of the code being generated automatically. By drastically reducing the amount of manual coding, developer errors are reduced, and quality and productivity are increased. In addition, the code-generation process provides greater code consistency, thus improving code quality and maintainability as well.

Likewise, manager development is also enhanced through a convenient C++ interface that insulates the developer from complex object manipulations. This interface may result in as much as a tenfold reduction in the lines of code for writing manager requests and defining agent responses, for example.

Some tool kits are actually elements of an integrated suite of tools and platforms that facilitate and accelerate the development and deployment of agent- or manager-based network management solutions. These tools are targeted at various phases of the software development life cycle: requirements analysis, high-level design, detailed design, test, and implementation.

It is not enough to have agent-manager development tool kits—there must be a means to test the results before implementation in the live environment. For this task, there are test tool kits that automatically create a suite of tests and provide automated and interactive methods to send those tests to an agent and receive performance reports that aid in further development.

Through the use of interactive and regression tests, the agent tester tool kit fully exercises the agent during customization and testing. The interactive test method provides the ability to incrementally test the customization of the agent, while the regression-testing method allows for a complete suite of tests to be executed, with the results being verified against the expected results. The agent tester tool kit also gives developers the flexibility to customize generated test programs, incorporating event-handling and response, error-handling, and complex MIB definitions.

This level of automation means that developers can completely test their agents without ever writing code, enabling rapid deployment of effective and reliable management solutions, while reducing development costs, improving quality, and shortening the development cycle.

Agents can be built with Java and used to monitor and report on key performance metrics of systems, services, and applications. Since Java is a cross-platform development tool, agents built with Java can provide a single, unified management system to support any mix of IP-based desktop, server, and network resources that also run Java—including hubs, switches, and routers. In addition to relieving the burden of front-line managers, who usually must cope with a collection of unrelated tools while demands on them are accelerating, the Java agents can self-populate through the network to add new resource support and functionality enhancements.

The agents can also collaborate to resolve problems directly—and without alarm generation—rather than escalating them to a higher-level manager in the traditional way. This intelligence reduces management traffic on the network, enables faster response to events, and reduces administration costs. In addition, the agents can be managed directly via a basic Web browser or through an existing SNMP management application.

2.2.3 Agent Applications

Agent technology has been available for several years and still represents one of the fastest growing areas in network management—and for good reason. In a global economy that encourages the expansion of networks to reach new markets and discourages the addition of personnel to minimize operating costs, it simply makes sense to automate as many management tasks as possible through the use of intelligent agents. In recognition of these new business realities, the list of tasks that are being handled by agents is continually growing.

Performance Management

Network performance monitoring can help determine network service-level objectives by providing measurements to help managers understand typical network behavior and normal periods. The challenge is defining “typical” and “ normal.” Intelligent agents can help define the network’s behavior and gather the information for documenting achieved performance levels. The following capabilities of intelligent agents are particularly useful for building a network performance profile:

Baselining and network trending: Identifies the true operating envelope of the network by defining typical and normal behavior that can be used to compare performance at some time in the future, perhaps to see if service level objectives are still being met and reveal out-of-norm conditions, which, if left unchecked, may have drastic consequences on the productivity of users.
Application usage and analysis: Identifies the overall load of network traffic, what times of the day certain applications load the network, which applications are running between critical servers and clients, and what their load is throughout the day, week, and month. Application usage and analysis allows the network manager to discover important performance information on a real-time or historical basis.
Client-server performance analysis: Identifies which servers may be over utilized, which clients are hogging server resources, and what applications or protocols they are running. Such performance analyses help the network manager define and adhere to client-server performance objectives.
Internetwork perspective: Identifies traffic rates between subnets so the network manager can find out which nodes are using WAN-links to communicate. This information can be used to define typical rates between interconnect devices. This perspective can show how certain applications use the critical interconnect paths and define normal WAN use for applications.
Data correlation: Allows peak network usage intervals to be selected throughout the day to determine which nodes are contributing to the network load at that peak point in time. Traffic source and associated destinations can be determined with seven-layer protocol identification.

Applications Management

There are client-side agents that continuously monitor the performance and availability of applications from the end user’s perspective. A just-in-time applications performance management capability captures detailed diagnostic information at the precise moment when a problem or performance degradation occurs, pinpointing the source of the problem so it can be resolved immediately.

Such agents are installed on clients as well as application servers. They monitor every transaction that crosses the user desktop, traversing networks, application servers, and database servers. They monitor all distributed applications and environmental conditions in real-time, comparing actual availability and performance with service-level thresholds.

This analysis enables network and application managers to understand the source of application response time problems by breaking down response times into network, application, and server components. As a result, troubleshooting that sometimes takes weeks can be accomplished in a matter of minutes.

Fault Management

When faults on the network occur, it is imperative that problems be resolved quickly to decrease the negative impact on user productivity. Network managers must be able to respond quickly and have procedures in place to reestablish lost service and maintain beneficial service levels. The following capabilities of intelligent agents can be used to gather and sort the data needed to quickly identify the cause of faults and errors on the network:

Packet interrogation: Isolates the actual conversation that is causing the network problem, allowing the network manager to get to the heart of the problem quickly.
Data correlation: Since managers cannot always be on constant watch for network faults, it is imperative to have historical data available that provides views of key network metrics at the time of the fault. What was the overall error/packet rate and the types of errors that occurred? What applications were running at the time of the fault? Which servers were most active? Which clients were accessing these active servers, and which applications were they running? Data correlation can help answer these questions.
Identification of top error generators: Identifies the network nodes that are generating the faults and contributing to problems such as bottlenecks caused by errors and network down time.
Immediate fault notification: With immediate notification of network faults, managers can instantly learn when a problem is occurring before users do. Proactive alarms help detect and solve the problem as it is happening.
Automated resolution procedures: Intelligent agents can be configured to automatically fix the problem when it occurs. The agent can even be programmed to automatically e-mail or notify help desk personnel with instructions on how to solve the problem, thus saving time and money.

Capacity Planning and Reporting

Capacity planning and reporting services play a significant role in delivering sustainable network service levels to end users. They also provide documented proof to management and other organizations that pay for services to help ensure that network service levels are consistently achieved. Capacity planning and reporting allows for the collection and evaluation of information to make informed decisions about future network configurations, accommodating growth in client-server computing environments. The following capabilities of intelligent agents can be used to assist in managing network growth:

Baselining: Allows the network manager to determine the true operating performance of the network by comparing performance at various times, perhaps on a monthly basis, which can identify business cycle deviations.
Load balancing: Allows the network manager to compare inter-network service objectives from multiple sites at once to determine which subnets are overor underutilized. It also helps the network manager discover which subnets can sustain increased growth and which require immediate attention.
Protocol/application distribution: Helps the network manager understand which applications have outgrown which domains or subnets. For example, these capabilities can find out if certain applications are continuously taking up more precious bandwidth and resources throughout the enterprise. With this kind of information, the network manager can better plan for the future.
Host load balancing: Allows the network manager to obtain a list of the top network-wide servers and clients using mission-critical applications. For example, the information collected from intelligent agents might reveal if specific servers always dominate precious LAN or WAN bandwidth, or spot when a central processing unit (CPU) is becoming overloaded. In either case, an agent on the LAN segment, WAN device, or host can initiate load balancing automatically when predefined performance thresholds are met. The information gathered by the agent can be used for resource planning.
Traffic profile optimization: To best guarantee service-level performance, the ability of network managers to compare actual network configurations against proposed configurations is crucial. From the information gathered and reported by intelligent agents, traffic profiles can be developed that allow what-if scenarios to be put together and tested before incurring the cost of physically redesigning the network. This takes the guesswork out of determining the best placement of client/server nodes and applications, for example.

Web Traffic Management

To build Web sites for electronic commerce and other mission-critical applications, administrators are mirroring site content at additional points of presence (PoPs). This provides redundancy in case one site goes down, and enables traffic to be routed between the sites to increase overall response time. Flow management software determines which Web server to send a request so the fastest service can be provided to the clients.

Resonate Inc.’s Global Dispatch, for example, integrates multiple PoPs into a single Web site resource. The company’s flow management software uses three factors to determine where to send a request: PoP availability, PoP load, and the Internet latency between the client and each PoP.

As requests are received, the Global Dispatch scheduler instructs the agents installed at each Web server to measure the latency between the PoP and the client’s local domain name system (DNS). Results are sent back to the Global Dispatch scheduler and combined with current load and availability information to return to the client the IP address (or virtual IP address) of the PoP best suited to respond. Global Dispatch stores this information in cache to enable faster response to future requests.

A single PoP can also have multiple agents, each performing a share of the triangulation work, which minimizes scheduling overhead. The use of multiple agents is especially useful in large Web site environments, since each server eventually must be taken offline for repairs or upgrades. Flow control scheduler/agent software allows a machine to be removed from the server mix and have the traffic routed to other Web servers so users can continue to access various services.

Security Management

A properly functioning and secure corporate network plays a key role in maintaining an organization’s competitive advantage. Setting up security objectives related to network access must be considered before mission-critical applications are put in potentially compromising networked environments. Intelligent agents can help discover holes in network security by continuously monitoring network access with the following capabilities:

Monitor effects of firewall configurations: By monitoring post firewall traffic, the network manager can determine if the firewall is functioning properly. For example, if the firewall was just programmed to disallow access of a specific protocol or external site, but the program’s syntax was wrong, the intelligent agent will report it immediately.
Show access to and from secure subnets: By monitoring access from internal and external sites to secure data centers or subnets, the network manager can set up security service-level objectives and firewall configurations based on the findings. For example, the information reported by the intelligent agent can be used to determine whether external sites should have access to the company’s database servers.
Trigger packet capture of network security signatures: Intelligent agents can be set up to issue alarms and automatically capture packets upon the occurrence of external intrusions or unauthorized application access. This information can be used to track down the source of security breaches. Some intelligent agents even have the capability to initiate a trace procedure to discover a breach’s point of origination.
Show access to secure servers and nodes with data correlation: This capability reveals which external or internal nodes are accessing potentially secure servers or nodes and identifies which applications they are running.
Show applications running on secure nets with application monitoring: This capability evaluates applications and protocol use on secure networks or traffic components to and from secure nodes.
Watch protocol and application use throughout the enterprise: This capability allows the network manager to select applications or protocols for monitoring by the intelligent agent so that the flow of information throughout the enterprise can be viewed. For example, this information can identify who is browsing the Web, accessing database client-server applications, or using the e-mail system.

Some agents are capable of taking action based on the nature of the security threat. Symantec, for example, offers its Intruder Alert, which uses a real-time, manager-agent architecture to monitor the audit trails of distributed systems for “footprints” that indicate suspicious or unauthorized activity on all major operating systems, Web servers, firewalls, routers, applications, databases, and SNMP traps from other network devices. Instead of reporting suspicious activity hours or even days after it occurs, Intruder Alert instantly takes action to alert IT managers, shut systems down, terminate offending sessions, and other steps to stop intrusions before they damage critical systems.

Typically, an organization would use either a network-based intrusion detection system to monitor only a handful of key facilities that transport sensitive information, or use a host-based solution that places monitoring agents on the systems that host critical applications and store vital data. By adding a host-based manager-agent component called NetProwler to Intruder Alert, Symantec is able to offer a combined approach to intrusion detection. From within Intruder Alert’s management interface, administrators can view multiple NetProwler events and hundreds of Intruder Alert agents, enabling them to react to either network- or host-based violations from a single console.

Local Area Networks

Introduction

Local area networks (LANs) were developed in the 1980s, starting with Ethernet and quickly followed by token ring and others. They enable members of an organization to share databases, applications, files, messages, and resources such as servers, printers, and Internet connections. The promised benefits of LANs are often too compelling to ignore: improved productivity, increased flexibility, and cost savings. These benefits sparked the initial move from mainframe-based data centers to a more distributed model of computing, which continues today. The impetus for this “downsizing” can come from several directions, including:

Senior management, who are continuously looking for ways to streamline operations to improve financial performance.
End users, who are becoming more technically proficient, resent the gatekeeper function of data center staff, and want immediate access to data that they perceive as belonging to them. In the process, they benefit from being more productive and in their ability to make better and faster decisions, which comes from increased job autonomy.
IT management, who are responding to budget cutbacks or scarce resources, and are looking for ways to do more using less powerful computers.

From their own perspectives, LANs represent the most feasible solution. With PCs now well entrenched in corporate offices, individuals, work groups, and departments have become acutely aware of the benefits of controlling information resources and of the need for data coordination. In becoming self-sufficient and being able to share resources via LANs, users have become empowered to better control their own destinies within the organization. For instance, they can increase the quality and timeliness of their decision making, execute transactions faster, and become more responsive to internal and external constituencies—all without the need to confront a gatekeeper in the data center.

In many cases, this arrangement has the potential of moving accountability to the lowest common point in the organization, where many end users think it properly belongs. This scenario also has the potential of peeling back layers of bureaucracy that have traditionally stood between users and centralized resources. IT professionals eventually discovered that it was in their best interest to gain control over LANs, enabling them to justify their existence within the organization by using their technical expertise to keep LANs secure and operating at peak performance. Further, there was the need to assist users, who were not technically savvy. Rendering assistance helped companies get the most out of their technology investments.

Ethernet

Ethernet is a type of LAN that uses a contention-based method of access to allow computers to share resources, send files, print documents, and transfer messages. The Ethernet LAN originated as a result of the experimental work done by Xerox Corporation at its Palo Alto Research Center (PARC) in the mid-1970s and quickly became a de facto standard with the backing of Digital Equipment Corp. (DEC) and Intel Corp. Xerox licensed Ethernet to other companies that developed products based on the specification issued by the three companies. Much of the original Ethernet design was incorporated into the 802.3 standard adopted in 1980 by the Institute of Electrical and Electronic Engineers (IEEE).

Ethernet is contention-based, meaning that stations compete with each other for access to the network, a process that is controlled by a statistical arbitration scheme. Each station “listens” to the network to determine if it is idle. Upon sensing that no traffic is currently on the line, the station is free to transmit. If the network is already in use, the station backs off and tries again. If multiple stations sense that the network is idle and transmit at the same time, a “collision” occurs and each station backs off to try again at staggered intervals. This media access control scheme is known as carrier sense multiple access with collision detection (CSMA/CD).

1.2.1 Frame Format

The IEEE 802.3 standard defines a multi-field frame format, which differs only slightly from that of the original version of Ethernet, known as “pure” Ethernet (see Figure 1.1):

Preamble. The frame begins with an 8-byte field called a preamble, which consists of 56 bits having alternating 1 and 0 values. These are used for synchronization and to mark the start of the frame. The same bit pattern used in the pure Ethernet preamble is used in the IEEE 802.3 preamble, which includes the 1-byte start frame delimiter field.
Start frame delimiter. The IEEE 802.3 standard specifies a start frame delimiter field, which is really a part of the preamble. This is used to indicate the start of a frame.
Address fields. The destination address field identifies the station(s) that are to receive the frame. The source address field identifies the station that sent the frame. If addresses are locally assigned, the address field can be either 2 bytes (16 bits) or 6 bytes (48 bits) in length. A destination address can refer to one station, a group of stations, or all stations. The original Ethernet specifies the use of 48-bit addresses, while IEEE 802.3 permits either 16- or 48-bit addresses.
Length count. The length of the data field is indicated by the 2-byte count field. This IEEE 802.3-specified field is used to determine the length of the information field when a pad field is included in the frame.
Pad field. To detect collisions properly, the frame that is transmitted must contain a certain number of bytes. The IEEE 802.3 standard specifies that if a frame being assembled for transmission does not meet this minimum length, a pad field must be added to bring it up to that length.
Type field. Pure Ethernet does not support length and pad fields, as does IEEE 802.3. Instead, 2 bytes are used for a type field. The value specified in the type field is only meaningful to the higher network layers and was not defined in the original Ethernet specification.
Data field. The data field of a frame is passed by the client layer to the data link layer in the form of 8-bit bytes. The minimum frame size is 72 bytes, while the maximum frame size is 1,526 bytes, including the preamble. If the data to be sent uses a frame that is smaller than 72 bytes, the pad field is used to stuff the frame with extra bytes. In defining a minimum frame size, there are less problems to contend with in handling collisions. If the data to be sent uses a frame that is larger than 1,526 bytes, it is the responsibility of the higher layers to break it into individual packets in a procedure called “fragmentation.” The maximum frame size reflects practical considerations related to adapter card buffer sizes and the need to limit the length of time the medium is tied up in transmitting a single frame.
Frame check sequence. A properly formatted frame ends with a frame check sequence, which provides the means to check for errors. When the sending station assembles a frame, it performs a cyclical redundancy check (CRC) calculation on the bits in the frame. The sending station stores the results of the calculation in the 4-byte frame check sequence field before sending the frame. At the receiving station, an identical CRC calculation is performed and a comparison made with the original value in the frame check sequence field. If the two values do not match, the receiving station assumes that a transmission error has occurred and requests that the frame be retransmitted. In pure Ethernet, there is no provision for error correction; if the two values do not match, notification that an error has occurred is simply passed to the client layer.

1.2.2 Media Access Control

Several key processes are involved in transmitting data across the network; among them, data encapsulation/decapsulation and media access management, which are performed by the media access control (MAC) sublayer of Open Systems Interconnection’s (OSI) data link layer.

Data Encapsulation/Decapsulation Data encapsulation is performed at the sending station. This process entails adding information to the beginning and end of the data unit to be transmitted. The data unit is received by the MAC sublayer from the logical link control (LLC) sublayer. The added information is used to perform the following tasks:

Synchronize the receiving station with the signal;
Indicate the start and end of the frame;
Identify the addresses of sending and receiving stations;
Detect transmission errors.

The data encapsulation function is responsible for constructing a transmission frame in the proper format. The destination address, source address, type and information fields are passed to the data link layer by the client layer in the form of a packet. Control information necessary for transmission is encapsulated into the offered packet. The CRC value for the frame check sequence field is calculated, and the frame is constructed.

When a frame is received, the data decapsulation function performed at the receiving station is responsible for recognizing the destination address, performing error checking, and then removing the control information that was added by the data encapsulation function at the sending station. If no errors are detected, the frame is passed up to the LLC sublayer.

Specific types of errors are checked in the decapsulation process, including whether the frame is a multiple of 8 bits or exceeds the maximum packet length. The address is also checked to determine whether the frame should be accepted and processed further. If it is, a CRC value is calculated and checked against the value in the frame check sequence field. If the values match, the destination address, source address, type and data fields are passed to the client layer. What is passed to the client station is the packet in its original form.

Media Access Management

The method used to control access to the transmission medium is known as media access management, which is responsible for several functions, starting with collision handling, which are defined by the IEEE 802.3 standard for contention networks. There are two collision handling schemes: one for detection and one for avoidance.

With detection (i.e., CSMA/CD), collisions occur when two or more frames are offered for transmission at the same time, which triggers the transmission of a sequence of bits called a “jam.” This is the means whereby all stations on the network recognize that a collision has occurred. At that point, all transmissions in progress are terminated. Retransmissions are attempted at calculated intervals. If there are repeated collisions, a process called “backing off” is used, which involves increasing the retransmission wait time following each successive collision.
With collision avoidance [i.e., carrier sense multiple access with collision avoidance (CSMA/CA)], the line is monitored for the presence or absence of a signal (carrier), as in CSMA/CD. But with collision avoidance, a broadcast is issued to the network notifying other stations that a data transmission is about to occur. While CSMA/CA is effective at avoiding collisions on a network, it has an additional overhead requirement that CSMA/CD does not. This results in CSMA/CA increasing network traffic because it has to broadcast the intent of the station to transmit before any real data is put onto the cable.

On the receiving side, the management function is responsible for recognizing and filtering out fragments of frames that resulted from a transmission that was interrupted by a collision. Any frame that is less than the minimum size is assumed to be a fragment that was caused by a collision.

Fast Ethernet

The 100BaseT is the IEEE standard for providing 100-Mbps Ethernet performance and functionality over ubiquitously available UTP wiring. Like 10BaseT Ethernet, this standard specifies a star topology. The need for 100 Mbps came about as a result of the emergence of data-intensive applications and technologies such as multimedia, groupware, imaging, and the explosive growth of high-performance database software packages on PC platforms. All of these tax today’s client-server environments and demand even greater bandwidth for improved response time.

1.4.1 Compatibility

Also known as Fast Ethernet, 100BaseT uses the same contention-based MAC method—CSMA/CD—that is at the core of IEEE 802.3 Ethernet. The Fast Ethernet MAC specification simply reduces the “bit time”—the time duration of each bit transmitted—by a factor of 10, enabling a 10-fold boost in speed over 10BaseT. Fast Ethernet’s scaled CSMA/CD MAC leaves the remainder of the MAC unchanged. The packet format, packet length, error control, and management information in 100BaseT are all identical to those used in 10BaseT.

Since no protocol translation is required, data can pass between 10BaseT and 100BaseT stations via a hub equipped with a 10/100-Mbps bridge module. Both technologies are also full-duplex capable, meaning that data can be sent and received at the same time. This compatibility enables existing LANs to be inexpensively upgraded to the higher speed as demand warrants.

1.4.2 Media Choices

To ease the migration from 10BaseT to 100BaseT, Fast Ethernet can run over Category 3, 4 or 5 UTP cables, while preserving the critical 100-meter (330-foot) segment length between hubs and end stations. The use of fiber allows even more flexibility with regard to distance. For example, the maximum distance from a 100BaseT repeater to a fiber-optic bridge, router, or switch using fiber-optic cable is 225 meters (742 feet). The maximum fiber distance between bridges, routers, or switches is 450 meters (1,485 feet). The maximum distance between a fiber bridge, router, or switch—when the network is configured for half-duplex—is 2 km (1.2 miles). By interconnecting repeaters with other internetworking devices, large well-structured networks can be easily created with 100BaseT. The type of media used to implement 100-Mbps Ethernets is summarized as follows:

100BaseTX: A two-pair system for data grade (EIA 568 Category 5) UTP and shielded twisted-pair (STP) cabling.
100BaseT4: A four-pair system for both voice and data grade (Category 3, 4, or 5) UTP cabling.
100BaseFX: A multimode two-strand fiber system.

Together, the 100BaseTX and 100BaseT4 media specifications cover all cable types currently in use in 10BaseT networks. Since 100BaseTX, 100BaseT4, and 100BaseFX systems can be mixed and interconnected through a hub, users can retain their existing cabling infrastructure while migrating to Fast Ethernet.

The 100BaseT also includes a media-independent interface (MII) specification, which is similar to the 10-Mbps AUI. The MII provides a single interface, which can support external transceivers for any of the 100BaseT media specifications.

Unlike other high-speed technologies, Ethernet has been installed for over 20 years in business, government, and educational networks. The migration to 100-Mbps Ethernet is made easier by the compatibility of 10BaseT and 100BaseT technologies, making it unnecessary to alter existing applications for transport at the higher speed. This compatibility allows 10BaseT and 100BaseT segments to be combined in both shared and switched architectures, allowing network administrators to apply the right amount of bandwidth easily, precisely, and cost-effectively. Fast Ethernet is managed with the same tools as 10BaseT networks, and no changes to current applications are required to run them over the higher speed 100BaseT network.

Gigabit Ethernet

Ethernet is a highly scalable LAN technology. Long available in two versions—10-Mbps Ethernet and 100-Mbps Fast Ethernet—the next version standardized by the IEEE offer another order of magnitude increase in bandwidth. Offering a raw data rate of 1,000 Mbps or 1 Gbps, the so-called Gigabit Ethernet uses the same frame format and size as previous Ethernet technologies. It also maintains full compatibility with the huge installed base of Ethernet nodes through the use of LAN hubs, switches, and routers.

Gigabit Ethernet supports full-duplex operating modes for switch-to-switch and switch-to-end-station connections and half-duplex operating modes for shared connections using repeaters and the CSMA/CD access method. Figure 1.2 illustrates the functional elements of Gigabit Ethernet.

The initial efforts in the standards process drew heavily on the use of Fibre Channel and other high-speed networking components. Fibre Channel encoding/decoding integrated circuits and optical components were readily available and are specified and optimized for high performance at relatively low costs. The first implementations of Gigabit Ethernet employed Fibre Channel’s high-speed, 780-nm (short wavelength) optical components for signaling over optical fiber and 8B/10B encoding/decoding schemes for serialization and deserialization. Fibre Channel technology operating at 1.063 Gbps was enhanced to run at 1.250 Gbps, thus providing the full 1,000-Mbps data rate for Gigabit Ethernet. Link distances—up to 2 km over single-mode fiber and up to 550 meters over 62.5-micron multimode fiber—were specified as well.

In mid-1999, the IEEE and the Gigabit Ethernet Alliance formally ratified the standard for Gigabit Ethernet over copper. The IEEE 802.3ab standard defines Gigabit Ethernet operation over distances of up to 100 meters (330 feet) using four pairs of Category 5 balanced copper cabling. The standard adds a Gigabit Ethernet physical layer to the original 802.3 standard, allowing for the higher speed over the existing base of Category 5 UTP wiring. It also allows for auto-negotiation between 100-Mbps and 1,000-Mbps equipment. Table 1.1 summarizes Gigabit Ethernet standards for various media.

Table 1.1: A Summary of Gigabit Ethernet Standards for Various Media *Specification Transmission Facility Purpose Source:* IEEE 802.3z Gigabit Task Force.
Specification	Transmission Facility	Purpose
1000BaseLX	Long-wavelength laser transceivers	Support links of up to 550m of multimode fiber or 3,000m of single-mode fiber
1000BaseSX	Short-wavelength laser transceivers operating on multimode fiber	Support links of up to 300m using 62.5-micron multimode fiber or links of up to 550m using 50-micron multimode fiber
1000BaseCX	STP cable spanning no more than 25m	Support links among devices located within a single room or equipment rack
1000BaseT	UTP cable	Support links of up to 100m using four-pair Category 5 UTP

The initial applications for Gigabit Ethernet will be for campuses or buildings requiring greater bandwidth between routers, switches, hubs and repeaters, and servers. Examples include switch-to-router, switch-to-switch, switch-to-server, and repeater-to-switch links.