Governing agentic AI: what human-in-the-loop actually requires

The term "agentic AI" is used loosely, but for governance purposes it has a reasonably precise meaning: an AI system that uses an LLM or similar model to plan a sequence of actions, executes those actions using tools (web search, code execution, API calls, database writes), and iterates based on feedback from the environment — all without a human approving each step. The system has, in effect, been delegated a task rather than a prompt.

This matters for the EU AI Act because the Regulation was written with a different system architecture in mind. Most of the conformity requirements assume a system that takes an input, produces an output, and stops — one where a human can inspect the output before it has consequences. Agentic systems do not work that way. The consequences can occur during the execution sequence, before any human has had an opportunity to review anything.

What Article 14 says

Article 14 is the human oversight provision. It requires that high-risk AI systems be designed and developed such that natural persons can:

effectively oversee the system during the period of use
understand the system's capabilities and limitations
detect and address malfunctions, including unexpected outputs
intervene or interrupt the system when necessary (Article 14(4)(d))
decide not to use the system in a particular situation (Article 14(4)(e))

These requirements apply to high-risk systems. Whether a specific agentic system is high-risk depends on the standard classification analysis under Articles 6–7 and Annex III. An agentic system that manages customer support interactions is probably not high-risk. One that makes decisions about access to social services, or that operates in a safety-critical industrial environment, almost certainly is.

The challenge is that Article 14 was written for the static input-output model, not for agents. "Interrupt the system" is well-defined when the system is processing a query. It is less clear when the system is mid-way through a ten-step plan involving database modifications, external API calls, and file system writes.

What "human-in-the-loop" means in practice for agents

The phrase human-in-the-loop (HITL) covers a spectrum of arrangements that are not equivalent from a governance standpoint.

Human-in-the-loop — A human must approve each step before the agent proceeds. The most conservative design; also the one that defeats much of the purpose of an agentic system, since it reduces to a human executing a sequence of actions with AI suggestions.

Human-on-the-loop — The agent executes, but a human monitors in real time and can intervene. This is operationally feasible for systems that operate at human timescales, less so for those that can complete a task in seconds.

Human-in-command — The human defines the task, sets constraints, and retains the ability to halt the system, but does not monitor individual steps. The agent reports back when complete or when it encounters a decision it cannot resolve.

For most enterprise agentic deployments, human-in-command is the operating model. It preserves the efficiency benefit of delegation while retaining meaningful oversight. The governance question is what "meaningful" means — and what constraints the system operates under during the execution window.

The design requirements that follow

If an agentic system is high-risk, meeting Article 14 in an agentic architecture requires explicit design decisions.

Action classification and gating. Not all agent actions carry the same risk. A read-only action (web search, database query) is categorically different from a write action (sending an email, modifying a record, executing a financial transaction). High-risk write actions should require explicit approval before execution, even in an otherwise automated pipeline. This classification should be part of the system design, not an afterthought.

Reversibility constraints. Where possible, agent actions should be designed to be reversible. This is not always achievable — a sent email cannot be unsent — but the set of irreversible actions the agent can take without approval should be as small as the task allows. The irreversible action set is, effectively, the boundary of human oversight.

Audit trail. The agent's plan, the actions it took, the inputs and outputs at each step, and any decisions it made should be logged in a form that is accessible after the fact. This serves both the Article 14 oversight requirement and the post-market monitoring obligations under Article 9. An agent that cannot explain what it did is not auditable.

Halting conditions. The system should be designed with explicit conditions under which it stops and escalates to a human rather than proceeding. These conditions should include: encountering a decision outside the defined action space; receiving an unexpected response from an external system; exceeding a defined confidence threshold for a consequential action; and reaching a defined time or step limit.

Scope constraints in the system prompt. The agent's system prompt (or equivalent configuration) is the primary control surface for constraining agentic behaviour. It should explicitly define: what tools the agent may use; what data it may access; what categories of action it may not take without approval; and what it should do when it encounters an ambiguous situation. These constraints should be treated as part of the technical documentation, not as informal operational guidance.

The GPAI layer

Many agentic systems are built on top of GPAI models — general-purpose AI models from providers such as Anthropic, OpenAI, Google, or Mistral. Under Title IX of the Regulation, GPAI providers have their own set of obligations, but those obligations do not extend to the downstream deployer's use case.

Concretely: if an organisation builds an agentic HR screening tool on top of a GPAI model, the GPAI provider's compliance under Title IX is not a substitute for the deployer's high-risk conformity obligations. The deployer is responsible for the system they deploy, including the way the GPAI model is configured and constrained for the specific use case.

This creates a practical gap in the supply chain. The deployer needs documentation about the GPAI model's capabilities, limitations, and known failure modes — the information they need to complete their own risk management process. Article 53(1)(b) requires GPAI providers to give deployers this information. In practice, the depth and accessibility of that documentation varies considerably between providers.

What to do before deployment

For organisations building or deploying agentic AI systems in potentially high-risk domains, the practical pre-deployment checklist is the following.

Classify the system. Does the agentic use case fall within Annex III? Work through each domain. Employment, access to services, critical infrastructure, and public administration are the domains where agentic deployments are most likely to be high-risk.

If high-risk, apply the full conformity framework. The fact that the system is agentic does not modify the conformity requirements — it makes them harder to satisfy, because the architecture requires more deliberate design to meet the human oversight standard.

Design the HITL model explicitly. Decide which actions require approval, which are reversible and which are not, what the escalation conditions are, and what the audit trail must contain. Document these decisions as part of the Technical File.

Obtain the necessary GPAI documentation from the foundation model provider. If the provider's documentation is insufficient for the risk management process, that gap needs to be addressed — either by requesting the information from the provider or by conducting the relevant testing independently.

Reference: Regulation (EU) 2024/1689, Articles 9, 14, 53. Recitals 58, 77.

Governing agentic AI: what human-in-the-loop actually requires

What Article 14 says

What "human-in-the-loop" means in practice for agents

The design requirements that follow

The GPAI layer

What to do before deployment

BelkX Practice

ISO 42001 and the EU AI Act: where they align, where they diverge

Insights: what we write about, and why

The August 2026 deadline: what high-risk AI operators must have in place