An over simplified view of training is that it is a series of inferences. Training of a fundamental model starts with a model that is constructed with a fixed architecture and randomly assigned parameter values (currently in the hundreds of billions of parameters per LLM, expected to grow into the Trillions). Then, the LLM is asked a question to which the answer is already known (in the training data). The answer the LLM provides is compared with the known answer and the difference between them is measured. This difference is often called a cost function.
Next the model is run backwards to find the part of the model that if changed will produce the most change in the answer. Once that parameter is found, it is manipulated, the LLM is run again, and the cost is measured. This is repeated until a minimum cost is achieved. This process is iterated for each of the many billions of items in the training data set. Then, iterated with combining costs from different parameters runs.
Thus, training is extremely processor intensive. Even with the most powerful GPUs it can take many weeks of 24/7 processing using an entire large data center of the most powerful GPUs to train a billion parameter LLM. With just one step down in the power of the GPUs, the many weeks can become many months. This is the reason that generations of a particular fundamental model are typically 6 months apart.
Fundamental models can be further trained. This further training is done by adding new material to the training data set and conducting more sets of iterations. This further training is sometimes
called tuning. Typically, the new training data is several orders of magnitude smaller than the fundamental training data set. Therefore, tuning is less processor intensive. Tuning may be done to
strengthen the LLMs capabilities in general, or it may focus on one particular area or domain.
Data provided in inference questions or in the attention blocks can leak out to other users. To avoid this IP leakage, some large companies and government institutions are considering creating
private GenAI systems. Companies are concerned with proprietary data leakage. Governments are concerned with other types of information leakage. It also appears that some are considering creating
their own custom fundamental LLMs. That is, doing their own training of models from scratch. If this comes to pass, it would reverse the decline in the number of active teams doing fundamental
training. Such a reversal could create sufficient demand for the projected trillion dollar investment in GenAI data centers.
Another alternative is for each of these organizations to take an existing fundamental LLM and fine tune it by adding training with their proprietary data. Such tuning would be adding a few percentage points of new training data. Thus, if this approach is followed, it would not result in the demand for large numbers of data centers.
It appears highly unlikely that, with the exception of a very few government agencies (intelligence? military?), organizations would have the staff talent necessary for, and could justify the expense of, creating their own fundamental LLM. Thus, most who try would fail.
Pending the next disruptive innovation of the order of the one that triggered GenAI to start with, the technology/market forces described above indicate the following.
The overwhelming majority of inference accesses will be performed on Edge resources. Starting with notebooks first appearing in late 2025. Then, adding local LAN accessed servers. Followed shortly thereafter by inference locally on Smart Phones in late 2026.
For security reasons, organizations may restrict access to LLMs tuned with their proprietary data. The restriction will be in layers. The first layer will be access control (password, multi factor authentication, etc.) to local servers and data center implementations. For more critical data, physical access controls may be deployed.