Pipeline Publishing, Volume 3, Issue 11- Wedge Greene and Barbara Lancaster, LTC International

Pipeline Publishing, Volume 3, Issue 11

	This Month's Issue:
	The Long Arm of Telecommunications Law

Carrier Grade: The Myth and the Reality of Five Nines

article page | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |

did not count. And even above this, scheduled maintenance down time did not count toward availability. The Voice NOC had seemingly qualified away rigor in order to meet the mandated goal of five-nines. (Once again proving the rule that you get what you measure.)

Essentially the Voice NOC was not wrong, nor the Data division right. What the NOC was measuring was compliance with a customer-oriented SLA, a Service Level Agreement, and not really an availability measure. The Data division was measuring availability as “measured uptime,” or the probability the network could be used at any specific time the customer wished to use it. Today we clearly understand the difference between SLAs and Availability, and define them in separate and individually appropriate ways. So the justification session worked, and Wedge Greene henceforth started attending standards meetings as an Operations guy.

High availability is consistently used by telecommunications vendors to describe their products, perhaps even more often than the umbrella term “carrier grade”.

In practice, these numbers are mostly estimates that the manufacturer makes about the reliability of their equipment. Because telecom equipment frequently is expensive, not deployed in statistically significant sample set sizes, and rushed into service as soon as it passes laboratory tests, the manufacturer estimates its reliability.

It is not clear when the measure of reliability of an individual network element became the measure of overall network availability – but it did: customers don’t care if one element fails, or dozens fail. Customers only care that the service they are paying for and rely on works as offered. It is also interesting to note that this five-nines either transferred from network elements to computing systems. Today

Hardware Origins

Let us backtrack a bit. It is likely that the origin of five-nines availability comes from design specifications for network elements. It also seems likely that the percent measure came before the current MTBF (Mean Time Between Failures) and MTTF (Mean Time To Fix) measurement, since it is a simply expressed figure and the MTTF requirements often match the % calculation while being expressed as an odd-ish number. However, 99.999% is not so accurate, fundamentally when you examine it closely, because of the fuzziness of the definition of availability. So in MIL-HDBK-217 and Bellcore/Telecordia TR332 they standardized these measures. The basic hardware design measures became:

MTBF - the average time between failures in hardware modules

MTTR - is the time taken to repair a failed hardware module.

computer server reliability is critical to the network availability, so it is actually convenient that both seek the same standard for describing quality.

Defining Availability

IEEE Reliability Society - Reliability Engineering subgroup defines “Reliability [as] a design engineering discipline which applies scientific knowledge to assure a product will perform its intended function for the required duration within a given environment.” Reliability is the flip side of the availability coin.

From a common sense perspective, the meaning of availability is clear. But when you want to measure it, and then hold someone to task for delivering that availability, you must define an operational definition for it. Federal

article page | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |

© 2006, All information contained herein is the sole property of Pipeline Publishing, LLC. Pipeline Publishing LLC reserves all rights and privileges regarding
the use of this information. Any unauthorized use, such as copying, modifying, or reprinting, will be prosecuted under the fullest extent under the governing law.