GenAI Agentic User Assurance

ORDER REPRINTS DOWNLOAD COMMENT DISCUSS SHARE

Specifying Behavior

To avoid a problem at the outset, existing GenAI guard rails and previous work on policy interfaces can be helpful. But we need a more rigorous way to tell the agents what we want them to do and not do. This can best be done with explicit descriptions of objectives, algorithms, and constraints. Using the management by objective process in modern organizations can be helpful in thinking about specifying objectives for each of a group of AI agents that will be working together. Each agent should have a clearly defined set of objectives in clear priority order. Algorithm specification is important both for managing behavior of individual agents and for cooperation between groups of agents.

There is also a relationship between the type of algorithms used and the objectives and constraints. For example, mathematical or statistical processes may be an algorithm. How these algorithms express their results may determine how objectives and constraints are stated.

Of course, an LLM (Large Language Model that underlies a specific GenAI system) is itself an algorithm. The selection of which LLM to use can be a critical choice. Which one in an agent structured such that it has multiple to choose from. Which one is in use by another agent that a particular agent is cooperating with can also be important.

Finally, constraints are statements about actions, behaviors or conditions to never allow. An example might be a group of three agents. Each monitoring the operation of a machine driven by electricity. Each agent has an objective to run their machine at the highest possible efficiency. In such a situation, a constraint might be to never allow the total power consumption of the three machines to exceed X Watts.

The Role of the Wizard

In the Sorcerer’s Apprentice, the Wizard comes in to save the day. Of course, if there was a wizard present at the beginning, there wouldn't have been a problem to start with. Following on the analogy with GenAI, there has to be a “Wizard” present at the beginning, and at the end.

There has to be a process for testing a GenAI agent before it is deployed in production. Most software is tested before being introduced into production. For GenAI agents, that kind of testing should be done. If the normal software testing a vendor and an organization use before deployment doesn’t include running the agent in a simulated environment; that should be added. These simulated environments are sometimes called “sandboxes.” This testing must not only cover expected behavior. It must also test for unexpected behavior.

This kind of normal testing is good, but not enough. Additional precautions are prudent. When the agent is first introduced into production environments, it needs to be monitored — something like being watched by a wizard. The “wizard” can be a manual process, an automated process, or a combination of the two. It looks to make sure that the agent is doing what it is supposed to be doing and meets expected quality metrics. In LLM terminology, “quality” has a specific meaning — a percentage measure of how often the LLM produces the correct response.

Ongoing observation is also prudent. Recent reports indicate that agents can suffer degradation of quality over time. The observation wizard may be configured to have a lower duty cycle (time between observations, for example) later than that at initial deployment. Also, budgets and project schedules should take into account the contingency of refreshing the agent periodically. Measurements should also take into account the time required for refreshing the agent. If that is not done, the danger is that quality falls below acceptable levels before a refreshed agent can be deployed.

Conclusions

We are still in the process of determining what and how GenAI can produce the best benefits. GenAI agents are the current focus of that effort. Turning the push to deploy GenAI agents into good user experience may turn out to be the greatest 2025 challenge. The Soccer’s Apprentice problem is real. We need a formal, rigorous way to tell the agents what we want them to do and not do, make sure that when first deployed they only do what is expected, then follow their ongoing operation to make sure they continue to do what is expected. This can best be done with explicit descriptions of objectives, algorithms, and constraints, plus an oversight mechanism.

In closing, it is always a good idea to remember that old saying: “Be careful what you ask for.”

Follow @PipelineWire