|
article page | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
report on the quality of the early
frame relay network. A big requirement
was to measure the availability of
the network and produce one monthly
number for availability. He was a
data guy and trained as a scientist
and engineer. So he set out to “measure” availability – directly
measure availability. His team developed
automated methods of pinging every
network port and of correlating down
and up link alarms; by merging these
two methods (direct measured ping
non-
|
|
The
Voice NOC had seemingly qualified
away rigor in order to meet
the mandated goal of five-nines.
|
|
removal from service of
terminating CPE routers
at the customer presence.
He then explained the difference
between precise and accurate.
He tried to explain that
normalization (or adjustments)
was not empirically possible
and the measure should
be just one of many indications
of customer usage after
uptake. Further, he showed
that
|
|
|
|
responses and interpolated differences
in time between down link alert and
uplink alert), an automated measure
of network up time was generated. This
was sent up the management chain every
month. After a few months, down from
the management chain came the request
to generate this KPI weekly, next month
it became a demand for daily data.
Finally the newly appointed General
Manager of the division paid a personal
visit to Wedge. The GM needed to find
out why the network was “so bad”.
The automated network availability
figures were much lower than five-nines.
Five-nines availability was the corporate
standard and the accounting people
were “going to shut down the
network if the quality did not immediately
improve.” In the face of this
terrible report card, the VP of Engineering
was blaming the VP of Operations (“You
cannot run a network or fix anything
on time”) who was tossing the
responsibility right back (“You
provided us inferior equipment”).
Furthermore the Voice Network division,
using a different set of calculations,
said their network met five-nines and
this is what all the customers expect. “Your
inferior data network is dragging us
down.”
Wedge
spent a day presenting the methodology
for measuring and computing availability.
He stood firm that the measure was
absolutely precise, but not necessarily accurate since
they could not control the
|
|
the number was steadily getting
better as more units were deployed
and more customers placed commercial
applications on their VPNs (and so
kept their routers running). It was
here that Wedge learned that this
approach to calculating availability,
that the data guys had developed
from ground zero (read through IETF
group brainstorming sessions), was
not the normal telecom approach.
It seems that in order to meet the
GOAL of five-nines, which had somehow
become transferred from an individual
network element requirement to an
overall network requirement, the
voice telecom NOC had successively
kept ruling outages out of the computation
of “availability.” Voice
people reported availability out
of their trouble ticket system. There
was no direct measurement of the
network. Availability was computed
from the time a ticket was opened
until it was closed. Than came the
exclusions: only customer opened
tickets counted – proactive
restorations did not count (if the
tree fell and no one noticed before
it was propped up, it did not fall),
then the beginning of the outage
start time was from the time the
ticket was opened (a somewhat long
time from when the problem physically
occurred), and lastly, any time the
ticket was transferred to the customer
(“on the customer’s clock”)
|
article
page | 1 |
2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|
|
|