|
article page | 1 | 2 |
3 | 4 | 5 | 6 | 7 | 8 | 9
Standard 1037C and MIL-ST-188 define
telecommunications availability as
a ratio of the time a module
can be used (if a use request existed)
over a period of time. It is a ratio
of uptime to total time. Expressed
otherwise, it is the proportion of
time a system is functioning. But
what is overlooked, and actually
somewhat funny, is that five-nines
is actually referring to a logarithmic
unit derived from the number
of nines following the decimal point.
|
|
Since 100% availability
is very difficult to achieve, SLAs
become a kind of amortization game.
|
|
However
you define it, five-nines
of availability is a
pretty tough target to
hit. This corresponds
to 5.26 minutes of unplanned downtime
in a year. It is rather
unlikely that if the
NOC engineers must be
involved in a restoration
it will take less than
6 minutes to get notification
of the outage, auto process
the alarm, display the
alert information, look
|
|
|
|
Further, is availability
an average or a probability? Generally,
if you are building a network element,
you would define availability as
a “probability of failure”.
But if you are an engineer in a NOC
running a report, you would define
availability as an average. Whereas,
according to Cisco,
the number of defects in a million
is used to calculate software availability
and time does not enter directly
into the calculation.
While NEBS is mostly concerned with
safety, the NEBS standards - Telcordia
(GR-63 CORE and GR-1089 CORE NEBS Level
3) also provide goals for central office
reliability of network hardware.
Setting Availability Requirements
Cisco says: “Increased availability
translates into higher productivity,
and perhaps higher revenues and cost
savings. Reliability implies that the
system performs its specified task
correctly; availability, on the other
hand, means that the system is ready
for immediate use. Today's networks
need to be available 24 hours a day,
365 days a year. To meet that objective,
99.999 or 99.9999 percent availability
is expected.”
You see this figure
everywhere. “Carrier grade
means extremely high availability.
In fact, the requirement for availability
in commercial telecom networks is
99.999%... and is known as fine nines
reliability.” [Carrier
Grade Voice Over IP, a book
by Daniel
Collins]
|
|
up and display any related information,
have a NOC engineer acknowledge the
alert, open a trouble ticket, diagnose
the root cause, determine the likely
fix, launch the device interaction
tools, change the necessary items
on the device, and restart the service.
Hence the overwhelming need to automate
the entire process, relying heavily
on redundancy to provide seamless
transfer from the failing element
to a healthy element. But of course,
one expects that any specific element
module will actually go more than
a year without fault, so the availability
target can be averaged out to meet
the NOC goal of five-nines.
This averaging and
predicting of element reliability
becomes very clear with another precisely
defined measure from MIL-HDBK-217
and Bellcore/Telecordia TR332:
FITS - the total number of failures
of the module in a billion hours
(i.e. 1000,000,000 hours).
Our civilization likely will not
last a billion hours which is 117,041
years, so of course this measure
must be estimated, computed from
component parts, or derived from
very large sample sets (like number
of chips produced in a year). Bellcore (Telcordia)
standards TR-332 Issue 6 and SR-332
Issue 1 provide guideline models
for predicting failure.
|
article
page | 1 | 2 | 3 |
4 | 5 | 6 | 7 | 8 | 9 | |
|
|