By Ken Ferderer
Since the beginning of network computing, networks have been designed as a collection of independent devices that have a limited awareness of each other, and no awareness of the collective whole outside of the basic routing fabric. As a result, networks have always had a limited ability to dynamically adjust to problems or events occurring within the network and are incapable of adjusting to any changes outside the network.
Traditionally, network engineers have attempted to build around these shortcomings with redundant routes, standby devices, and other mechanisms that introduce some basic resiliency into the network. These simple workarounds have always begged the larger questions:
- Is it possible for a network and its collection of independent devices to react appropriately to changes in the environment without manual intervention?
- Could a network ever recognize an event outside of its routing fabric that requires a change to its behavior or operation of the collective whole?
- And based on what happens, could it modify the behaviors of multiple independent devices to accommodate that event?
Take as an example a sophisticated emergency response network, which links together special sensors that detect fires and chemical, radiological or nuclear threats. Based on the ‘type’ of event recognized by the sensor arrays, the underlying logical network must be dynamically configured in any number of pre-defined configurations.
In one such scenario, the sensors may report that a fire has been detected and, based on this event, the network should immediately alter its current logical configuration to provide secure connectivity between first responders, including police and fire departments, local authorities, and local news agencies. If however, a chemical or nuclear event is detected, the underlying network should instantaneously reconfigure itself to securely connect all federal response agencies, command and control, and route around any unresponsive sites.
To date, this type of dynamic network-level reconfiguration – or self-healing – based on a non-network event has simply not been possible. Even the network management solutions that, by definition, provide a broader view of the deployed network environment, are only able to recognize limited network-level events, such as dropped routes, throughput issues, and device failures. These solutions have a very limited capability to automatically react to, or recover from, any type of network event. In fact, most network management solutions are only