Other than for security reasons, Inference on large organization private data center resources will be primarily driven by applications that require very high levels of performance. In the public GenAI data center space, demand will be similarly driven by the need for very high performance.
There will be promises of IP leakage prevention. But organizations are likely to be skeptical of these claims. So, public usage will generally be used only for applications where privacy and security are not significant considerations.
These high performance LLMs will continue to grow in size and complexity. The costs involved in coding, training, testing, regulation, liability, will also continue to grow. The result will be a continued reduction in the number of teams developing new foundation models. Thus, a reduction in the number of large data centers required to perform training.
An entire supporting ecosystem will be developed in the process of evolving to this Edge dominated use of LLMs.
The danger is that if the ecosystem developed doesn’t correctly anticipate how things will actually play out, there can be stranded assets.
If trillions of dollars are invested in new GenAI data centers, a significant portion of the resulting assets could become stranded. That is, there would not be enough demand to generate the returns to pay off the loans used to create them. Such a situation would cause pain for those directly involved. Pain for society in general if a significant number of data centers are involved and the financial failures ripple through the general economy.
It takes time to build a new data center and an associated nuclear reactor for power. First comes the design and permitting process. Then, constructing the data center building. Both of these can take up to a year each. Filling the data center and connecting it to network resources can be done more quickly. Building a nuclear reactor takes longer than building a data center. Initially, many of these new data centers may try to run on grid-supplied power. The risk can be somewhat limited by financing done in tranches tied to each stage. But if much of this infrastructure is built and the users don’t show up, there will not be enough revenue to pay back the loans.
A possible scenario is the construction of 200 $5 billion data center/network/nuclear reactor complexes. If one such complex targeted on public inference and training services were started in January of 2024, permitting and design completed in January 2025, building/network construction completed in January 2026, limited service starting in June 2026. Full service starting in 2029 with the completion of the nuclear reactor.
Meanwhile, the Nvidia PC and Apple M4 chips come out in Q4 2025. By June 2026, inference on Edge systems has growing market share. Demand for data center inference starts to plateau and then drop. While a drop in demand for public training services is triggered by the shrinking number of teams developing new fundamental LLMs, plus security concerns, there might be time to cut the financing for the nuclear reactor. But a large percentage of the assets financed would be stranded.
Some will argue that these assets can be targeted at other sources of demand. Others argue that the overall demand for inference will grow so fast and so high that even with percentage reductions, the overall demand will be sufficient to make these investments economic. Those arguments may be correct, but maybe not. There is also the threat of another disruptive technology breakthrough. Thus, the need for caution.
Advances in chips designed to run GenAI on Edge devices like notebooks and smart phones herald a move of inference off data centers. Technology and market forces are reducing the number of teams developing new fundamental LLMs and the concomitant demand for data center resources to support LLM training. This transition will be disruptive in a variety of ways. This likely scenario indicates a need for caution in making large financial commitments.