AI Traffic Transforming Networks

ORDER REPRINTS DOWNLOAD COMMENT DISCUSS SHARE

Generally, we think of transformation as a process triggered by a human. That is, people think about fundamentally new ways to solve a problem. Then, implement it. Today, our networks are being transformed not by human choice, but rather by evolving AI traffic patterns. This transformation is not a single-event response. Rather, to be successful, network leaders have to anticipate and plan for a series of changes in traffic patterns triggered by developments in GenAI and supporting hardware.

WANs

Over the last 60 years, Wide Area Networks have transformed from point-to-point analog, to mesh TCP/IP, to data center TCP/IP, to large numbers of data centers with local points of presence. These transformations started out with what we call forklift migrations. That is physically rip and replace. The invention of SDR (Software Defined Radio) and SDN (Software Defined Networking) helped make the transformations less costly and disruptive. But, still, changing fundamental network infrastructure architecture can be very expensive. The emergence of GenAI is creating another transformation, and network leaders need to manage the infrastructure evolution carefully to avoid costly difficulties.

Right now, a lot of planning, investment, and implementation is focused on developing a relatively small number of extremely large AI-focused data centers. These data centers are, and will become more so, magnets for traffic. So much traffic that the previous WAN structure will have trouble handling the traffic.

The WAN traffic problem is not just the amount of AI traffic, but rather the geographical concentration. That is, the relatively small number of AI data centers. This forces the construction of new physical networking resources in the same area as the AI data center and in the areas feeding into it. The investments required to do this can be quite large.

AI data center concentration is a function of the business models of the providers, the technology, and the applications (the types of uses the technology is put to). In the technology, training requires very large and growing computing resources all in a single data center. As each generation of LLM (Large Language Model, that is, the engine of GenAI) is 10X larger than the previous generation, the data center required to train it goes up accordingly. Training does not produce extremely large WAN traffic volumes. The Training corpus can be quite large. But it is moved once into the data center, where training occurs.

Individual inference requests (the process of getting output from an LLM) do not require anything near the same level of resources. But when many simultaneous requests are being served, the resource requirement goes up accordingly. In data center implementations, this is producing very large amounts of traffic.

The first business model focused on chat applications. That is, many users subscribe to a service where they can ask questions and carry on conversations with a particular GenAI system. These service providers offering inference deliver service continuously. If a data center AI is not available every once in a while, it is an annoyance. Not a crisis.

More recently, intelligent agents have been created using GenAI. Some of these agents have time-critical 24/7 responsibilities. When there was a network problem recently and data center AI’s were not available, this caused crisis-level problems for some of these agents.

Training is done more intermittently. From a business perspective, it is valuable to be able to use large training resources for other purposes. Some rent out units of resource (such as Nvidia processors) on a per-minute basis. Others provide inference services via not actively employed training facilities.

Large corporations are building their own AI data centers. These are set up to run models that are configured for their specific needs. This may mean custom LLMs or additional customized training of existing LLMs. It often involves very sensitive proprietary information and proprietary processing.

This kind of concentrated AI traffic coming into a relatively small number of very large AI data centers tends to produce a hub and spoke network architecture. This is similar to what the San Francisco financial district commuter traffic produced in the San Francisco Bay Area public transportation system. That is, a transportation network architecture dominated by the requirement to deliver very large numbers to and from a center.

Emerging Edge AI

While these large data centers with hub and spoke networks are being developed to meet the ever-growing traffic demand, Edge computing capabilities are increasing. The ability to run the largest LLMs on commodity hardware with SSD streaming is here, although with increased latency. Recently, that has been enhanced, reducing latency by allowing several computers to cooperate in running a very large model. At the same time, systems are coming to market that further reduce the Edge processing limitation. Examples include the M4 (MacBook and Mini) Pro, M4 Max, and M4 Ultra series by Apple. Soon to be followed by the M5 series, with the M6 series approximately a year away. Although Apple appears to be a leader in the Edge hardware race at this time, others are sure to rise up to challenge Apple.

Operation at the Edge has some intrinsic advantages that include: reliability, privacy/IP protection, and network latency. There may also be financial drivers as well.

Follow @PipelineWire