Nokia's publicly documented deployment of an agentic network assurance system, built on Google Cloud infrastructure, is one of the few production multi-agent architectures in a high-stakes operational domain that has been described in sufficient technical detail to analyze structurally. The system routes network events through a pipeline of six specialized agents, each responsible for a distinct reasoning task, before any remediation action is taken. That design choice, separating anomaly detection from action recommendation, and both from execution, encodes a set of engineering principles that generalize well beyond telecom to any domain where automated actions are difficult or impossible to reverse.
The Core Architecture: Six Agents, One Pipeline
The Nokia Assurance Center architecture assigns each agent a bounded scope. An intake agent classifies and routes incoming network events. Separate agents handle anomaly detection, root-cause reasoning, and remediation recommendation. A catalog-lookup agent matches recommended actions against a pre-approved automation catalog before any execution occurs. A final validation layer checks proposed actions against operational constraints before they are dispatched.
This is not a monolithic model with a chain-of-thought prompt. Each agent holds a distinct model configuration, toolset, and output schema. The separation means that a failure or hallucination in the anomaly detection agent produces a structured error that the downstream reasoning agent can handle explicitly, rather than a corrupted intermediate state that propagates silently to execution.
Why Specialization Reduces Failure Surface
A single large model handling the full pipeline from event intake to remediation must resolve conflicting objectives within one inference pass: classify the signal, reason about cause, and constrain the action to what is operationally safe. In practice, these objectives pull against each other. A model optimized for broad anomaly recall will tend to over-flag, which is acceptable at the detection stage but produces excessive remediation candidates if the same model is also selecting actions.
Decomposing these objectives across agents allows each model to be evaluated and tuned independently. Detection recall can be maximized without inflating the false-positive rate on remediation recommendations, because the two tasks are handled by separate components with separate evaluation criteria. The commercial implication is that failure modes become attributable: when a remediation is incorrect, the engineering team can identify whether the error originated in detection, root-cause reasoning, or action selection, and address it without retraining the full system.
The Automation Catalog as a Structural Guardrail
The catalog-lookup step is the most consequential design choice in the pipeline. Before any remediation action is dispatched, the recommending agent's output is matched against a pre-defined catalog of approved automations. Actions that have no catalog entry are blocked regardless of the reasoning agent's confidence.
This creates a hard constraint that is independent of model behavior. Even if the reasoning agent produces a plausible but novel remediation, the catalog acts as a whitelist that prevents execution. The practical effect is that the action space available to the system at runtime is bounded by what human operators have explicitly approved in advance, rather than by what the model believes is correct.
For engineering teams designing agentic systems in adjacent domains, such as cloud infrastructure management or industrial process control, the catalog pattern offers a way to enforce operational policy without embedding it entirely in prompt instructions. Prompt-based constraints are soft: they can be overridden by sufficiently confident model outputs or adversarial inputs. A catalog enforced at the tool-call layer is not.
Separation of Anomaly Detection from Action Recommendation
The architecture keeps anomaly detection and action recommendation in distinct agents with no shared state between them. The detection agent produces a structured anomaly record. The root-cause agent receives that record as input and produces a causal hypothesis. The recommendation agent receives the hypothesis and queries the catalog. At no point does a single agent hold the full context from raw event to proposed action.
This separation has two effects. First, it limits the context window each agent must process, which reduces the probability of attention drift on long event sequences. Second, it creates explicit handoff points where human operators can inspect intermediate outputs without interrupting the pipeline. In a regulated environment, those handoff points also serve as audit checkpoints.
Guardrail Enforcement at the Architecture Level
Nokia's design enforces guardrails through structure rather than through model instruction alone. The catalog lookup is a tool call with a defined schema. The validation layer checks proposed actions against operational windows, affected customer counts, and change-freeze periods before dispatch. These are not model-evaluated constraints: they are deterministic checks executed by non-model components.
This distinction matters because LLM-based guardrails are probabilistic. A model instructed not to take disruptive actions during peak traffic will comply most of the time, but compliance is not guaranteed across all input distributions. A deterministic check on a timestamp and a customer-impact threshold will not hallucinate. Engineering teams designing agentic systems for operational domains should treat any constraint that must hold without exception as a candidate for deterministic enforcement, not prompt-based enforcement.
Implications for Pipeline Depth and Latency
A six-agent sequential pipeline introduces latency that a single-model architecture does not. Each agent adds an inference call, and each handoff adds serialization and routing overhead. In Nokia's case, the deployment targets network assurance events that are typically resolved over minutes to hours, so pipeline latency measured in seconds is acceptable. The same architecture applied to a domain with sub-second response requirements would require a different decomposition strategy, likely with parallel agent execution for independent reasoning tasks and sequential execution only where strict ordering is necessary.
Engineering teams should map their domain's tolerance for remediation latency before committing to a pipeline depth. A six-agent sequential design is appropriate when the cost of an incorrect action exceeds the cost of a delayed correct one. In domains where the inverse is true, shallower pipelines with stronger single-agent models and tighter catalog constraints are likely a better tradeoff.
What This Architecture Does Not Solve
The catalog pattern bounds the action space but does not address catalog maintenance. As network configurations evolve, new remediation actions must be reviewed, approved, and added to the catalog before the system can use them. If catalog updates lag behind operational change, the system will increasingly fall back to human escalation for novel scenarios, reducing its effective automation rate over time.
This is an organizational engineering problem as much as a technical one. The catalog must be treated as a living artifact with a defined review and approval process, not as a one-time configuration. Teams that treat the catalog as static will find that the system's autonomy degrades as the operational environment changes.
Where Vector Labs Fits
We design and build production multi-agent systems for operational and infrastructure-adjacent domains, with particular attention to failure attribution, guardrail architecture, and the boundary between model-evaluated and deterministic constraints. Our work on predictive maintenance for mission-critical X-ray security equipment at high-security locations, detailed at vector-labs.ai/case-studies/predictive-maintenance-xray-security, demonstrates how layered ML architectures can be structured for condition-based intervention with clear failure detection boundaries. If you are designing an agentic system for an operational domain and need to establish the right decomposition and guardrail strategy, contact us at vector-labs.ai/contacts.
FAQs
The number of agents should follow from the number of distinct reasoning objectives in the task, not from a target architecture shape. If two objectives can be evaluated and tuned independently, and if conflating them in a single model would create competing optimization pressures, they are candidates for separation. The Nokia architecture uses six agents because anomaly classification, root-cause reasoning, action recommendation, catalog validation, and operational constraint checking are genuinely distinct tasks with different evaluation criteria. Adding agents beyond what the task decomposition requires increases latency and operational complexity without a corresponding reduction in failure surface.
Prompt-based guardrails instruct a model to avoid certain behaviors. They are probabilistic: a well-constructed prompt will suppress disallowed outputs across most inputs, but not all. Structural guardrails enforce constraints outside the model, at the tool-call or API layer, using deterministic logic. The distinction matters when a constraint must hold without exception, such as blocking actions during a change-freeze window or preventing execution of any action not in an approved catalog. For those constraints, deterministic enforcement is necessary. Prompt-based guardrails are appropriate for shaping model behavior in ways where occasional non-compliance is recoverable.
The catalog should be treated as a versioned artifact with a formal review and approval workflow, analogous to how infrastructure change management processes handle new runbooks. Each new remediation action should require sign-off from operations and, where relevant, risk or compliance stakeholders before it is added. The catalog version in use at the time of any automated action should be logged alongside the action record, so that post-incident analysis can determine whether the catalog was current. Teams that do not establish this process will find that catalog staleness becomes the primary constraint on the system's effective automation rate within twelve to eighteen months of deployment.
Each agent in the pipeline should produce a structured output with a defined schema, and that output should be logged at the handoff point before the next agent receives it. When a remediation is incorrect, the investigation starts at the point of execution and traces backward through the logged intermediate outputs: was the action recommendation wrong given a correct causal hypothesis, or was the causal hypothesis wrong given a correct anomaly record? This requires that each agent's output be independently interpretable, which is one of the practical reasons to avoid shared mutable state between agents. Without logged intermediate outputs, failure attribution in a sequential pipeline collapses into end-to-end debugging, which is significantly more expensive.
A multi-agent architecture adds operational complexity: more inference endpoints to maintain, more handoff logic to debug, and more surface area for serialization errors. If the task does not contain genuinely separable reasoning objectives, or if the domain's latency requirements are tighter than a sequential pipeline can accommodate, a single model with well-structured tool calls and deterministic post-processing is often more appropriate. The case for decomposition is strongest when different stages of the task require different evaluation criteria, when intermediate outputs need to be auditable, or when the cost of a wrong action is high enough to justify the overhead of catalog validation and constraint checking as discrete pipeline stages.
The pipeline should route any action recommendation that has no catalog match to a human escalation queue with the full intermediate reasoning context attached: the anomaly record, the causal hypothesis, and the recommended action. This gives the human operator the information needed to act quickly and, if the action is appropriate, to initiate a catalog addition request. Tracking the volume and category of escalations over time provides a direct signal of catalog coverage gaps. If escalation rates are rising, it indicates that the operational environment is changing faster than the catalog review process can accommodate, which is a process problem that requires a governance response rather than a model change.

