Behavior-Space Architecture

also known as Behaviour-Space Architecture, Query-Selected Subsystem Routing

Treat a deployed agent as a space of behaviors over a pool of subsystems and let a router pick, per query, the minimal disjoint subset that query needs, so the effective architecture emerges per query.

Context

A team assembles an agent from several heavyweight subsystems: a plain tool loop, a retrieval-augmented pipeline, a knowledge graph, a multi-agent sub-team, a cache, and tiered memory. Most production traffic is mixed: a greeting needs none of them, a lookup needs only retrieval, an audit query needs the graph and memory, and a planning request needs the sub-team. Wiring every query through the full stack is the default, and it is the source of latency, cost, and brittleness.

Problem

A single fixed pipeline forces every request through every subsystem it bundles, which pays for retrieval, graph traversal, and multi-agent fan-out even when a query needs none of them. Each extra layer is a feature and a failure surface at once, so a maximal architecture maximises both spend and the ways a simple query can break. The system needs a way to spend only the subsystems a given query actually requires.

Forces

A richer subsystem pool answers more query types, but invoking the whole pool per query multiplies latency, cost, and the number of components that can fail.
The most-tuned subsystem is rarely the most-invoked one, so optimisation effort and routing decisions are usually misaligned.
A deterministic per-query selector is inspectable and testable, but it must be kept correct as subsystems are added or retired or it routes to the wrong subset.

Example

An enterprise support agent fronts a tool loop, a vector retriever, a knowledge graph of entitlements, and a planning sub-team. A user typing 'thanks' activates the bare loop and nothing else. 'What's my plan limit?' activates retrieval alone. 'Was my account ever over its quota, and what changed?' activates the graph plus memory. 'Draft a migration plan for my workspace' activates the planning sub-team. No query ever runs the full stack, and a new subsystem is added only when a recurring query class needs one.

Diagram

flowchart TD Q[Query] --> R{Router: classify intent + needed capabilities} R -->|trivial| L[Tool loop] R -->|lookup| RAG[Retrieval] R -->|relational| KG[Knowledge graph + memory] R -->|planning| MA[Multi-agent sub-team] L --> A[Compose active subsystems] RAG --> A KG --> A MA --> A A --> Resp[Response]

Solution

Therefore:

Model the agent as a pool of independent subsystems and a router rather than a wired graph of layers. For each incoming query the router classifies intent and required capabilities, then activates the minimal subset of subsystems that query needs and bypasses the rest; a trivial query may activate the bare loop alone, a complex one a retrieval-plus-graph-plus-memory subset. The subsets are disjoint and non-nested across query classes, so the architecture that actually runs is a property that emerges query by query instead of a shape chosen once at build time. Design effort concentrates on the routing heuristics and their evaluation, and a new subsystem is built only when a query class is shown to route through it, so the pool grows on demand rather than upfront.

What it gives you

A simple query pays only for the subsystems it activates, so common cheap traffic stops subsidising the cost and latency of rarely-needed heavy layers.
Each query class exercises a small, named subset, which shrinks the failure surface and makes a failed query trace back to one active subsystem.
The subsystem pool grows only when a query class demands it, so unused layers are never built or maintained.

What it costs you

Routing quality becomes the dominant determinant of system quality, so a mis-tuned router degrades every query class at once.
Maintaining correct disjoint subsets as subsystems are added or retired is ongoing work, and a stale mapping silently routes to the wrong subset.
A heterogeneous pool of subsystems is harder to observe and reason about end to end than a single fixed pipeline.

What this pattern forbids. A subsystem runs for a query only when the router selects it; no subsystem may self-activate or be invoked outside the router's per-query decision, and a layer must not be added to the pool before a query class routes through it.

And the patterns that stand alongside it, or against it —

complementsComplexity-Based Routing★— Estimate a request's difficulty up front and bind it to the cheapest model tier that can answer well, using an explicit complexity classifier as the routing key.
alternative-toDynamic Topology Routing·— Form and dissolve the connections between agents at runtime by matching the task to candidate collaborators, instead of committing the multi-agent system to a fixed chain, star, or mesh up front.
complementsModular RAG★— Decompose RAG into a typed three-layer hierarchy of Module Types, Modules, and Operators so the pipeline (routing, scheduling, fusion, retrieval, post-retrieval, generation) can be rearranged per query rather than running a fixed linear retrieve-then-generate.
alternative-toOrchestrator-Workers★★— An orchestrator dynamically breaks a task into subtasks at runtime and delegates each to a worker LLM, then synthesises results.