By: Mark Cummings, Ph.D.
With the explosive appearance of this generation of GenAI, it is hard to think about preparing for the next disruption. Hard, but necessary. The GenAI field is ripe for innovation in the two
areas of software and hardware. It is easy to miss the true innovation bubbling up because of other AI news coming out. With new versions of existing LLMs (Large Language Models) coming out every
few weeks, it is easy to fall into the trap of thinking that the incremental improvements we are getting in GenAI are the only genuine innovations. This is while we are going through a period of
stumbling implementations turning into effective intelligent agents.
This ongoing stream of technical innovation creates a lot of anxiety about job losses from GenAI automation. Thus, innovation is critical to address this anxiety and the counterproductive forces
it engenders.
Some may wonder if the financial situation around tariffs will slow AI innovation. It is not likely to. The battle for AI supremacy is expected to keep it going. However, it is possible to add to
the anxiety.
Successful leaders will not fall behind in the ongoing AI-driven competition. This means that those developing GenAI products need to be careful not to get too far heads down in existing
development streams and miss the next wave of disruption. For those implementing applications, there may be more time. Early signs of disruption may take longer to reach the stage where they are
ready for application implementation. However, these application developers must now be careful to document and archive the detailed requirements they are using with this generation of GenAI.
This documentation will allow them to speed up the deployment of the next disruptive generation. Leaders must recognize the anxiety around GenAI worker displacement to avoid productivity losses
and take active steps to reassure employees that they will be cared for.
AI Technology Innovation
On the surface, current GenAI technology development is relatively incremental. Things like creating ever-larger models, new ways of doing tuning, making smaller LLMs by pruning larger ones,
incremental improvements in current GPU and TPU architectures, etc., are happening. Under the surface, innovation is going on in new software and new hardware architectures.
In the software space, there are indications of new model architectures being developed in both big companies and early-stage start-ups. One example of this in large companies is Yann LeCun, VP
and Chief AI Scientist at Meta and a professor at NYU. He is leading an effort to develop
one view of the next generation of AI
(Joint Embedding Predictive Architecture - JEPA). He suggests a new paradigm with a model that includes understanding the physical world, persistent memory, reasoning, and complex planning
capabilities. He expects these models to be available for use in three to five years.
Another example is an early-stage start-up, Vital Statistics Inc., which has technology that makes today’s static LLMs dynamic. Once trained, today’s LLMs have a static structure: a set number of
columns and rows. This technology makes the shape of the model dynamic. It promises dramatic improvements in speed of inference, accuracy of results, and lowering of power consumption.
In the hardware space, one of the key issues is increasing the efficiency of systems. The actual individual calculations in LLM are relatively simple. However, they are performed as part of
extensive matrix calculations. One way of improving efficiency is speeding up data transfer to and from processors and between processors. Abacus Semiconductor is working on technology to lower
latency and power consumption in this communications process. Those working in this space are trying to improve the movement of data to make more efficient processing rather than just faster
processors. Some other early-stage start-ups are focusing on innovations in how data is moved. Another hardware approach being explored by an early-stage start-up is focused on extending the
architecture developed for GPUs to more general-purpose processors. Others are working on analog approaches.
The amount of software focused on Nvidia’s proprietary CUDA interface (combination of APIs and module libraries) makes it difficult for other semiconductor vendors. Groq and Cerebras are
addressing this by using their proprietary chips to run data centers that offer inference as a service. Apple and Google Cloud are doing similar things. Apple, with its M Series processors, is
being used in its data centers to support LLMs it runs to support its services. Google, using Google, developed TPUs it runs to support its services and Nvidia chips for running other people’s
LLMs in Google Cloud.
These are just examples. There are likely to be many more innovations on the threshold. Traditionally, software innovation has been faster than hardware because of the cost barriers to hardware
innovation/proof of concept. With GenAI, however, the cost of proving a concept has become quite large. Building, training, and testing any LLM is quite expensive. Breakthroughs that lower costs
are an advantage here. With that said, software innovations typically still come to deployment faster than