By: Nick Rowan
The rise of IoT networks and systems, as well as AI and cloud computing, has meant the global telecoms sector has needed to adapt rapidly. As such, it is undergoing one of the biggest transformations since Roberts, Kahn, and Cerf’s creation of TCP/IP.
The advances made in digital technologies, coupled with the rapidly evolving nature of customer demands, has put significant pressure on telecom network companies, with a need to create faster networks that have the low latencies needed for gaming, robotics, smart electricity grid control, financial trading, and more. In short, the traditional business model adopted by network operators is becoming less tenable. These organizations need to adapt to become more akin to out-and-out technology companies with a focus that is far greater than they are used to providing. To put it another way, there needs a migration from being “telcos to techcos.”
This brings several benefits, not least of which is the increased monetization of 5G services, but achieving this is not without complexities. As with all major inflection points, this shift enables new players to enter the market and compete with the existing major players. Getting this migration wrong can have huge consequences.
While there are many enabling technologies on the path from telco to techco, for simplicity this article will focus on three: the rise of sophisticated AIOps to enable an automated network operations center (NOC); the development of outsourced testing and validation to bring trust when adopting equipment from lesser-known vendors; and the use of digital twins/AI scenario generation tools to better model behavior in real-world conditions.
Until now, life as a telco has been relatively straightforward. It involved the build-out of large-scale network infrastructure, with the upkeep and roll-out of next-generation technologies to ensure the connectivity provided was fast, low-latency and above all reliable. In many ways the objective has not altered, but the amount of data has increased. And it has done so dramatically. To put this in context, in 2011 Google CEO Eric Schmidt said that “There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days”. While I cannot comment on the validity of Schmidt’s statement, it’s now 14-years later and the data sphere is expected to reach over 180 ZB (180 trillion TB) this year. At this volume, it takes just 0.87 seconds to generate 5 EB.
Therefore, to be a successful telco in this environment means doing far more than providing reliable infrastructure. To supply the required bandwidth at the required latencies for video streaming, over-the-top apps and services, gaming, and (of course) AI and IoT, telcos must embrace new technologies that can operate networks at scale.
Modern networks span technologies, vendors and domains while being expected to carry ever more data. Resilience is critical to operation (and reputation), so a holistic view from the center to the edge is required. Having people manually monitor the infrastructure at this scale is expensive and inefficient. Instead, a drive towards the ‘dark’ or completely automated NOC is needed to boost reliability.
Notably, taking a dark NOC approach eliminates the siloed and manually intensive nature of running an NOC, relying instead on AIOps tools that use a combination of AI, machine learning, and big data analytics to not only monitor but analyze network operations, and then optimize them in real-time for resource allocation, capacity planning, and predictive maintenance. This shift swaps reactive troubleshooting for predictive autonomous operational models and harnesses data from the switches, the firewalls, the servers, the routers, etc., to identify performance trends and issues, as well as correlating events across the network. Breaking this down further, this can be shown as four key separate functions:
Anomaly detection: Traditional NOC systems rely on predefined thresholds for performance metrics, but “normal” network behavior is based on a swathe of constantly shifting variables. This means not all incidents will meet fixed threshold values, and not all normal events will lie within them. AIOps tools enable the dark NOC to establish a continuously adjusting baseline to develop a more accurate picture of normal network behavior to fit the current, not the average conditions.Root cause analysis: With vast volumes of data generated comes the need for intelligent correlation systems to prevent operators becoming overwhelmed by the number of alerts and notifications. By automatically pooling related events and identifying the actual cause, not a correlated symptom of an incident, it is possible to significantly cut the mean time to resolution (MTTR) and therefore service disruptions.