By: Mark Cummings, Ph.D.
When planning for an AI agent to achieve operational agility and efficiency, it may be tempting to base it on the standard online version of the most leading-edge Co-Pilot or GenAI system.
Doing so can be a mistake. It is important to pick an LLM and its configuration that is the best fit for your agent. A plan for monitoring agent performance and AI technology evolution is
important. Because LLM technology and business practices are evolving so fast, you need to have a way to determine when to update and have the necessary data documented and in place to do it
efficiently.
To start the process, it is good to review the application requirements. The requirements posed as questions below are particularly important for decisions around developing your agent.
Interaction Between Requirements and GenAI Selection
The first question should be, do you need GenAI or can you use conventional programming? For example, if your application is essentially based on a fixed well structured decision tree (series of:
if; then; else decisions), conventional programming may be the best development technique. If that is the direction chosen, attention can turn to whether and how GenAI might assist with code
development.
If the application has some complexity, pattern recognition tasks, dynamic nature, etc., and a GenAI agent approach is being explored, the next set of questions involves privacy and security.
Does your agent handle proprietary data? If so, many online GenAI systems collect user input and use it in training. This practice can create what has come to be called an IP Leakage. That is an
Intellectual Property Leakage. In addition, there are emerging security vulnerabilities involving online AI. For example, OpenAI has been
breached 1,000 times.
Other online AIs having
been breached with Microsoft
Co-Pilot having a serious vulnerability.
If privacy and security are important requirements, basing your agent on a local GenAI implementation may be a better approach. There are two related questions with a similar answer. The first
concerns uptime. Online GenAI systems are subject to periods of
unavailability.
The second concerns latency. That is how fast your agent has to act. Online systems processing can be fairly fast, but they have two serious latency constraints. The first is propagation delay.
That is the time it takes for data to travel from the point of origination to the online system and back to the point of action. The second has to do with congestion. Congestion in the network
and in the online system. So, if any of the three are a concern, investigate the option of a local implementation.
Finally, you need to determine the application’s sensitivity to the quality of output. An over-simplified approach to quality can be broken down into: probability of hallucinations, and precision
of output.
When a GenAI hallucinates, the output appears to be authoritative, well-structured, and well-presented. Unfortunately, it has little or no relationship to reality. The hallucination question is,
how sensitive to hallucinations is your application? Are you developing a system that will be an adviser to operations staff that can deal with hallucinations? Will your system include actuators
that independently take action? Are the actuators able to filter out hallucinations?
The point is, if your application is sensitive to hallucinations, you need to take steps to shield it. GenAI hallucination is an area of intense study. Progress is being made at a fast pace. At
the same time, progress in other areas seems to have an impact. For example, recently, several sources, including OpenAI, said that early reasoning models (frontier LLMs that are designed and
trained to do something akin to human reasoning) had a
higher rate of hallucination than did
previous generations of non-reasoning models. By the time you read this, that may no longer be true.
Suppose reasoning models continue to have a higher probability of hallucination, and your application is sensitive to hallucination. In that case, you may need to choose an LLM that does not have
reasoning capability.
Some speculate that hallucinations can occur because inference requests fall outside of what the LLM has been trained on. Also, other aspects of performance can be affected by training
data. Thus, you may want to consider how the LLM was trained.