AI Strategy , Data science & AI , Software development Jun 22, 2026

Config as Code, MCP, and the New Infrastructure Primitives Your Engineering Team Needs to Manage AI at Scale

VECTOR Labs Team

Last updated on: Jun 23, 2026

Engineering organisations that scaled distributed device fleets in the 2010s learned a specific lesson about configuration drift: the moment a system's actual state diverges from its documented intended state, the team managing it is operating on assumption rather than evidence. That lesson was codified into infrastructure-as-code practices, GitOps workflows, and declarative configuration management tooling. The same divergence is now occurring in AI developer environments, where agent configurations, tool registries, model routing rules, and prompt scaffolding are accumulating across engineering teams without version control, without audit trails, and without any systematic mechanism for detecting when a production agent's behaviour no longer matches its original specification. The convergence of config-as-code discipline with the Model Context Protocol's emerging role as a programmable interface layer creates the conditions for addressing this directly, but only if engineering leaders make deliberate architectural decisions before the problem compounds.

Companion piece to our broader work on MCP architecture and agent design. See The MCP Stack: How Engineering Teams Should Architect AI Agents That Stay Accurate as the World Changes for a technical guide to tool and server design patterns, knowledge freshness tradeoffs, and grounding agents in authoritative data sources.

The Configuration Drift Problem Is Not New, But Its Surface Area Has Expanded

Configuration drift in device management was a tractable problem because the failure mode was visible: a server running the wrong kernel version would eventually produce a measurable fault. The configuration drift accumulating in AI agent environments is more insidious because the failure mode is behavioural rather than binary. An agent that was configured six months ago to call a specific internal API endpoint, apply a particular context window strategy, and route to a specific model version will continue operating after each of those parameters has been changed independently by different team members, none of whom updated a central record of intent. The agent does not crash. It produces outputs that are subtly wrong in ways that may not surface until a downstream system acts on them or a compliance audit traces a decision back to its source. The mechanism here is the same as in device management drift: distributed modification without centralised state reconciliation. The commercial implication is that the cost of remediation scales with the time between drift and detection, and in AI systems that time is typically measured in weeks or months rather than minutes.

What Config as Code Actually Means for AI Infrastructure

Config-as-code in traditional infrastructure means that the desired state of a system is expressed in human-readable, version-controlled files that a reconciliation engine can compare against observed state and correct. Applied to AI infrastructure, the same principle requires that every parameter governing an agent's behaviour, including its tool registry, its model selection logic, its context assembly rules, its fallback behaviour, and its output validation constraints, is expressed in a declarative format that lives in source control alongside application code. This is not primarily a tooling question. It is a discipline question about what counts as infrastructure. Most engineering teams currently treat prompt templates as product artefacts, model routing rules as application configuration, and MCP server definitions as operational notes. None of these are versioned with the rigour applied to Terraform modules or Kubernetes manifests, which means they carry none of the auditability, rollback capability, or change attribution that engineering organisations expect from production infrastructure.

MCP as a Programmable Interface Layer

The Model Context Protocol provides a structured mechanism for AI agents to interact with external tools, data sources, and services through a defined interface contract. Its architectural significance for configuration management is that it externalises tool connectivity from the agent's core logic, which creates a natural boundary at which configuration can be expressed declaratively and managed independently. An MCP server definition specifies what tools are available, what schemas they accept, what authentication they require, and what error handling they implement. If that definition lives in a version-controlled repository and is deployed through a controlled pipeline, then the set of capabilities available to an agent at any point in time is a function of a known, auditable configuration state rather than an implicit product of whoever last modified the server registration. This is structurally identical to how network device configurations became manageable at scale: by making the interface contract the unit of version control rather than the device itself.

Tool Registry Governance

A tool registry in an MCP-based architecture is the authoritative record of which tool servers are available to which agents under which conditions. Without explicit governance, tool registries accumulate entries through individual team decisions and lose entries through server deprecations that are not propagated to all consuming agents. The result is an environment where agents may be calling tools that have been deprecated, calling different versions of the same tool depending on which registry snapshot they were initialised against, or failing silently when a tool is removed without a corresponding update to the agents that depended on it. Treating the tool registry as a versioned infrastructure artefact, with change review processes equivalent to those applied to API contracts, closes this failure mode.

Model Routing and Version Pinning

Model routing rules determine which foundation model an agent calls for a given task class, and they are among the most consequential configuration parameters in an AI system because they directly affect output quality, latency, and cost. In practice, most engineering teams manage model routing through application-level environment variables or hardcoded defaults that are updated informally as new model versions become available. This means that two agents performing nominally equivalent tasks may be calling different model versions depending on when their deployment configuration was last touched, and a rollback to a previous agent behaviour may be impossible because the model version it depended on has been deprecated by the provider. Version pinning, combined with explicit routing rules expressed in version-controlled configuration, is the mechanism that makes agent behaviour reproducible across environments and over time.

Tribal Knowledge as a Production Risk

Tribal knowledge in AI infrastructure takes a specific form: the engineers who built a given agent know which prompt engineering decisions were made to compensate for a specific model's weaknesses, which tool call sequences were ordered to avoid a known race condition, and which context window limits were set to stay within a cost budget. That knowledge is not in the code. It is not in the configuration files. It is in the heads of two or three people. When those people leave, are reorganised, or are simply unavailable during an incident, the team inherits a system they cannot safely modify. The risk compounds as AI tooling proliferates, because the number of agents and tool configurations grows faster than the number of people who understand any given one of them. Codifying configuration in declarative, commented, version-controlled files is not a bureaucratic overhead. It is the mechanism by which institutional knowledge is transferred from individuals to the system itself.

The Audit and Compliance Dimension

Regulated industries face a more immediate version of this problem. When an AI agent participates in a decision that is subject to audit, the auditor will ask what configuration the agent was running at the time of the decision, what tools it had access to, and what model it called. If the answer to any of those questions requires interviewing a developer rather than querying a version control system, the organisation has a compliance gap. This is not a hypothetical risk in financial services, healthcare, or any sector where AI-assisted decisions are beginning to attract regulatory scrutiny. The EU AI Act's requirements around technical documentation and traceability for high-risk AI systems make configuration auditability a legal obligation for a growing set of use cases, not an engineering best practice. The architectural decision to treat agent configuration as version-controlled infrastructure is therefore also a regulatory risk management decision, and it is substantially cheaper to make that decision before a system is in production than after an audit identifies the gap.

Implementation Architecture: What to Standardise First

The practical question for a Head of Engineering is not whether to apply config-as-code discipline to AI infrastructure, but where to start given that most teams are already running agents in production without it. The highest-leverage starting point is the MCP server registry, because it is the boundary layer that governs what every agent can do. Expressing server definitions in a declarative format, storing them in a repository with branch protection and change review, and deploying them through a pipeline that records the deployed state creates an audit trail for tool availability without requiring changes to existing agents. Model version pinning is the second priority, because it is the configuration parameter most likely to cause silent behavioural regression when changed informally. Prompt templates and context assembly rules are the third priority, and they require the most organisational change because they are currently treated as product artefacts rather than infrastructure. Moving them into version control requires agreement between product and engineering on ownership and change processes, which is an organisational decision as much as a technical one.

Where Vector Labs Fits

Vector Labs designs and implements production MCP architectures and AI agent infrastructure for engineering organisations that need auditability, reproducibility, and governance built in from the start. Our work on MCP stack design, including tool server architecture, knowledge freshness controls, and agent grounding, is detailed in The MCP Stack: How Engineering Teams Should Architect AI Agents That Stay Accurate as the World Changes. To discuss how these principles apply to your current AI infrastructure, contact us at vector-labs.ai/contacts.

FAQs

What is the minimum viable config-as-code implementation for a team already running agents in production?

The minimum viable starting point is to place MCP server definitions and model routing rules into a version-controlled repository with a documented change review process. This does not require migrating existing agents or rewriting prompt templates. It establishes a known state for the two configuration parameters most likely to cause silent behavioural regression, and it creates an audit trail for future changes. Prompt templates and context assembly rules can be migrated incrementally once the higher-priority configuration boundaries are under control.

How does MCP's tool registry interact with existing API gateway infrastructure?

An MCP server registry and an API gateway serve different functions and typically coexist rather than replace each other. The API gateway manages authentication, rate limiting, and traffic routing for service-to-service communication. The MCP tool registry manages the interface contracts that agents use to discover and call tools, including schema definitions, capability declarations, and versioning metadata. In a well-structured architecture, MCP servers sit behind the API gateway and are registered in the tool registry with references to the gateway endpoints they use. This means the tool registry governs what agents can do, while the gateway governs how those calls are executed and secured.

How should model version pinning be handled when a provider deprecates a model version?

Model deprecations should be treated as a planned infrastructure change rather than an operational surprise. The practical mechanism is to maintain a model version manifest in version control that maps agent identifiers to pinned model versions, and to subscribe to provider deprecation notices so that upcoming deprecations appear as known future work in the engineering backlog. When a deprecation is announced, the migration to a replacement version should go through the same change review process as any other configuration change, including a validation step that confirms the replacement model produces equivalent outputs on a representative sample of the agent's task distribution before the change is promoted to production.

What does the EU AI Act require specifically in terms of AI configuration documentation?

For high-risk AI systems as defined under the EU AI Act, Article 11 requires technical documentation that is sufficiently detailed to allow assessment of conformity, including a description of the system's components, the data used, and the design choices made. For AI systems that incorporate foundation models accessed through external APIs, this includes documenting which model versions are used, under what conditions, and how that changes over time. Version-controlled configuration files that record model routing rules, tool registries, and prompt scaffolding directly support this documentation requirement, because they provide a timestamped record of the system's configuration state at any point in its operational history. Organisations that cannot reconstruct that state from version control will need to reconstruct it from developer recollection, which is not a defensible audit position.

How do you handle configuration management for agents that are dynamically assembled at runtime rather than statically defined?

Dynamically assembled agents, where tool sets and routing rules are determined at runtime based on task classification or user context, require a different approach to configuration management than statically defined agents. The configuration unit shifts from the individual agent to the assembly rules themselves: the logic that determines which tools are selected, which model is called, and which context strategy is applied. Those assembly rules should be expressed in version-controlled policy files rather than embedded in application code, so that changes to agent assembly behaviour are subject to the same review and audit processes as changes to static agent configurations. The runtime state of any given agent invocation should also be logged with sufficient detail to reconstruct the configuration that was active during that invocation, which is the mechanism that makes dynamic systems auditable.

What organisational structure supports config-as-code discipline for AI infrastructure at scale?

The most effective structure we have observed places ownership of the MCP tool registry and model routing manifest with a platform or AI infrastructure team that operates with the same change management discipline as a DevOps or SRE function, rather than with individual product teams. Product teams retain ownership of the agents they build and the prompt templates they develop, but changes to shared infrastructure, meaning tool servers, model routing rules, and context assembly policies that affect multiple agents, go through the platform team's review process. This creates a clear boundary between product-scoped configuration and infrastructure-scoped configuration, which is the same boundary that made config-as-code tractable in device management: separating the things individual teams own from the things the organisation owns.

A team that understands you

With 20+ years of experience in the world's leading consultancy companies, implementing AI and ML projects in industry-specific contexts, we are ready to hear your challenges.

Talk with an AI expert