Right LLM and Configuration
for Your Operations Agent

ORDER REPRINTS DOWNLOAD COMMENT DISCUSS SHARE

Quantization can impact the quality of GenAI output and the amount of computing resources required.

Here, the question is, what skills does your LLM need? In the case of LLMs, skills relate directly to the corpus (set of training data) used to train the LLM. If the skills required are within the normal types of training sets commonly used, this may not be a concern. But if your agent will be operating in a very highly specialized field, or you wish better performance, extra precision, etc., you may wish to do additional training. This extra training is commonly referred to as fine-tuning. Fine-tuning data sets are generally of the order of 1% the size of the full training corpus.

Configuring LLM’s for Your Requirements

Below is not an exhaustive list of configuration choices nor mechanisms for configuration. Here again, things are changing so fast that configuration parameter choices and consequences of various settings will change over time. As GenAI continues to evolve, configuration choices are not likely to disappear. So, the following examples are illustrative.

The platform you choose to support your LLM generally provides control of a number of these external variables.. If you are using an online GenAI service provider’s LLM, that service provider has already chosen the platform. Some service providers will work with you to set up a version of their GenAI system on your premises or on a public Cloud you select. Here again, the platform is generally chosen for you.

If you are running an open-source LLM in-house on a central site or distributed environment, you have to choose such a platform. Choosing such a platform is generally not difficult. The most important consideration is that it supports the LLM you have selected. In an organization, you may wish to standardize on one particular platform for ease of maintenance, training, support, etc. If so, this may limit the alternative LLM options.

One example of such external variables is Quantization. Quantization can impact the quality of GenAI output and the amount of computing resources required. Quantization refers to how many binary digits (quantity of digits) are used to encode (represent) each parameter. In a one trillion parameter model, reducing the quantization from 16 bits to 8 or 4 bits can have a big impact on the amount of computing resources required and the latency (speed of output) delivered. Here, the question is, what level of precision does your application need? Your answer to this question may impact how you need to configure the quantization external variable to achieve the necessary precision and the highest possible efficiency.

Another external variable is Temperature. There is controversy around whether temperature settings can increase the probability of hallucinations. Without a temperature setting, LLMs choose the highest probability symbol in their vocabulary. For online chatting applications, this was found to produce correct inferences, but less interesting ones. Temperature settings were developed to allow LLMs to select from a larger palette of high-probability symbols. You can see something similar in human-written poetry, where ending a phrase with an unexpected word can have a big impact. Making for a more interesting poem. There is disagreement about whether temperature settings can lead to higher probabilities of hallucinations.

What seems clear is that it may be a mistake to take an LLM configured for one purpose (I.e., to support online chatting) and not reconfigure it for use for another purpose (supporting an agent).

Development, Verification and Test

A development process common in semiconductor design is becoming common in GenAI agent development. Semiconductor design uses very specialized languages to describe in detail all the pieces of a semiconductor device, commonly called a chip. Verification is the process of testing this detailed design before it is sent to the fab (fabrication facility), where it is converted into an actual completed chip. Fab processing is very expensive. So, the objective is to avoid having to redo it by trying to find any potential problems in verification. Once the chip has been fabbed, it is tested to make sure it does what it is supposed to do and doesn’t do what it is not supposed to do.

This two-step verification and test process is entering the development cycle using agents. A GenAI system is used in an iterative process to evaluate GenAI-created implementations. There may be two separate GenAI systems. One for development and one for verification. They work iteratively. The first develops an implementation, and the second verifies it. Here, the objective is to minimize cost by reducing the time human developers have to work to get the GenAI developed system to the point where