This reflects its role in the economic lifecycle of a service: it is generally possible to initially bring a service to market without assuring it; it’s only when service quality issues begin to hurt the bottom line through poor customer satisfaction and retention that Service Assurance finds itself in the critical path of revenues.
Pre-empting this pattern is often seen as a mere optimisation because it’s only through the cumulative effect of the pattern over time that the provider suffers long-term, large-scale damage to its competitive standing.
This attitude is changing as markets have matured, forcing providers to play the long game, and although Service Assurance was still not front-and-centre in early NFV reference architectures from the likes of ETSI, it has become much more present in industry dialogue as CSPs start to think seriously about fully orchestrated services in production. This in turn has led to a re-examination of how they deliver assurance in services right now, and the question of how new technologies in play can be harnessed to provide immediate benefits.
The notion that the relationship between a correctly functioning service and the corresponding network configuration is not static – and so would benefit from being managed in a closed, automatic loop to modify the configuration to maintain the desired behaviour – isn’t new. When policy-based network management was in vogue twenty years ago, this idea was much discussed.
However, it was only ever implemented at the device level, never network-wide. This was mainly for two reasons: insufficient trust in automatic management software and fear of unintended emergent behaviours. Both stemmed from a lack of reliable network visibility.
Operators could not reliably know how their network was structured or how it was behaving. This body of knowledge was incomplete, fragmented and many parts of it were manually maintained. How therefore, they quite rightly reasoned, could an automatic control system possibly be trusted – and if couldn’t be trusted, how could they rely on it not to throw the network into chaos?
What has changed since then?
First and most obviously, operators have become more comfortable with the idea of some degree of autonomous control in the network - because it’s inherent to promised technical benefits of NFV like elastic scaling and because it’s been proven technically to some extent by web-scale enterprise players.
Secondly, economic circumstances for CSPs have become tougher. Margins are under unprecedented pressure and the cost of continuing operation as before, whilst at the same time scaling the network and services to ever greater degrees, is simply unsustainable. In the medium to long term, reducing operating costs through automation is no longer an optimisation to improve profit: it is mandatory for survival.
Both these changes in circumstance clearly apply to automation across the board, but in contrast to full service orchestration, which requires significant advances in service and resource meta-modelling, in activation and provisioning systems and direct support from the underlying network and compute hardware, many of the parts required for service assurance automation are already in place.
As well as the historical machinery of policy-based management, CSPs have been incrementally improving performance measurement, fault management and in many cases, have already begun to invest in analytics to make sense of the resulting increase in volume and quality of telemetry they can harvest. They now have a good quality, automatically curated view of what their network is doing.
However, closing the loop on service assurance – that is to say, enabling autonomous correction of poor service performance and autonomous pre-emptive change to avoid poor service in the first place – requires more than just measurement.