Pipeline Publishing, Volume 3, Issue 2
This Month's Issue: 
Time for a Check Up 
download article in pdf format
last page next page
Helping Systems Help Themselves
back to cover

By Wedge Greene

Autonomic QoS & Collaboration Systems

Proactive monitoring and response has been a core goal of Network Operations Centers (NOCs) for as long as I’ve been a part of telecom.  But – being clear about why we monitor is much more important than what we monitor.  Confuse those two important drivers, and a lot of time and money is wasted, not to mention that we’ll fail to meet the customer’s need for coherent information.
A real life example: The very first problem I was given when I moved into designing OSS systems was to “build a Frame Relay performance reporting system that will show users that data really is flowing over the pipes.”  The business requirement was to preempt calls from customers who were confused about virtual circuit turn-ups.   Soon after the first report was sent to a customer, our competitors jumped to match and better this ad hoc product feature, starting an escalation war that went way beyond original business requirements.  So the whole data performance reporting industry was started in order to be proactive in satisfying customers.  Rather ironic, however, that thousand page performance reports, sorted by “top problem circuit” and listing dropped packets and low utilization, did not satisfy customers after all, since these actually were not problems.  In retrospect, we, as OSS systems designers got this solution attempt very wrong.  But the business goal (better customer relations by being proactive) is still quite valid. 

Getting it right - Efficiency: To this valid business requirement, we perhaps need to add an efficiency requirement.  Certainly, huge volumes of data sent to the customer are not good.  During the peak of the Frame Relay product competition, a service provider had masses of enterprise server clusters feed from dozens of smaller poll servers, collecting information on 100,000 circuit endpoints every 15 minutes.  This data went into expensive, near-capacity-limit database systems, to generate massive email reports sent daily to customers.  One operator had two entire rooms of a data center dedicated to reporting machinery for just one network product.  Yet this still was a problem for customers, who had to hire someone to follow, interpret, and take action on these massive reports.  Again massed data sent to customers is not good; yet, a call to a customer contact that their circuit seems to have issues and “we are on it,” is very good.  How do we design systems that make this accurate and feasible?

These issues were investigated in the TeleManagement Forum (TMF) Service Level Agreement (SLA) project.  Given the maturity of thought and technology at the time, producing common definitions and expectations was the best that could be accomplished.  The SLA Handbook was a powerful artifact that enabled vendors, service providers, and customers to get on the same page when addressing problems.  It was in the TMF SLA work group that early discussions on QoS and SLAs occurred.  It was proposed that automatic notifications were possible and technically should be the standard.

"the advent of Pervasive Computing extended the reach of networked event-reaction systems to devices and sensors ‘at the edge.’ "

It is real:
It is certainly possible to build systems today which can deliver like results in real networks for real customers.  Today, these are called Autonomic Systems.  Originally, autonomic systems just referred to software that watched itself and fixed software or server problems before these impacted performance.  Often this was accomplished by automatically reloading/restarting software or automatically switching applications to healthy server resources.  Today, systems which do this are said to be characterized by Virtualization.  However, the advent of Pervasive Computing extended the reach of networked event-reaction systems to devices and sensors ‘at the edge.’

In use today, autonomic supply-chain logistics systems currently can rebalance delivery manifests and timing on shipments based on notifications of barcode reads on packages passing along a conveyor belt half way around the world.  These refreshed, expected-delivery notifications are automatically sent to their manufacturing customers whose systems adjust assembly schedules for their factories.  As this works today for international shipping companies, similar new software could do this for Telecom Service Providers.

Technical solutions alone cannot cut it: Even though the business goal is clear (proactive action for customers), it is still difficult for NOC executives and OSS designers to visualize the leap to this type of solution.  For so long, the NOC has concentrated on indirect indications of problems.  “Getting an alarm for a continuity problem and them launching a trouble response system process for fixing a down circuit” is only attacking a specific cause of a problem and not truly reaching out to the problem symptom experienced by the customer.  This worked for black & white problems – service or no service.  But even today, it is possible the customer has switched traffic to a competitor’s network before a long outage will be fixed.  With the advent of QoS, which implies the measure of and reaction on grey-scales of performance, the old approaches will not work.


article page | 1 | 2 | 3

last page back to top of page next page

© 2006, All information contained herein is the sole property of Pipeline Publishing, LLC. Pipeline Publishing LLC reserves all rights and privileges regarding
the use of this information. Any unauthorized use, such as copying, modifying, or reprinting, will be prosecuted under the fullest extent under the governing law.