XIII · Cognition & IntrospectionExperimental·

Reflexive Metacognitive Agent

also known as Self-Model Agent, Capability-Aware Agent

Agent maintains an explicit self-model of its own capabilities, confidence and limitations, and reasons over that model when accepting / refusing / handing off tasks.

This pattern helps complete certain larger patterns —

specialisesAwareness★— Maintain the agent's explicit knowledge of its own tools, capabilities, environment, and current context as queryable state.

Context

A team has an agent. The default agent accepts whatever task it is given and proceeds. There is no explicit self-model — the agent does not represent 'what I am good at' or 'what I should refuse'.

Problem

Without an explicit self-model, the agent has no principled way to refuse tasks outside its competence or hand off to a more suitable peer. Refusals are ad-hoc, based on prompt-level instructions that are inconsistent across calls. Differs from confidence-reporting (which is per-output) by making the self-model an *input* to decision-making, not just an output.

Forces

Maintaining an explicit self-model requires upfront capability characterization.
Self-model drift — the agent's actual capabilities change with model updates.
Reasoning over a self-model adds a step to every decision.

Example

A research agent's self-model: {capabilities: [literature-search, summarization], confidence: {medical-research: 0.6, legal-research: 0.3}, limitations: [no-citation-verification]}. Asked a legal-research question, the agent consults self-model, sees 0.3 confidence, refuses-with-reason and hands off to a legal-specialist peer. Without self-model, the agent would have attempted and produced low-quality output.

Diagram

Solution

Therefore:

Self-model is a structured artifact: {capabilities: [...], confidence-by-task-class: {...}, declared-limitations: [...]}. At task acceptance, agent reasons over self-model: does this task fall in my capabilities? what's my confidence for this class? are any declared limitations triggered? Output: accept / refuse-with-reason / handoff-to-peer-with-capability-X. Self-model refreshed periodically against eval-suite results. Pair with confidence-reporting, decentralized-swarm-handoff, refusal, typed-refusal-codes.

What this pattern forbids. The agent does not accept tasks without consulting its self-model; the self-model is an explicit artifact, not implicit prompt behavior.

And the patterns that stand alongside it, or against it —

complementsConfidence Reporting★— Surface the agent's uncertainty about its answer alongside the answer itself.
complementsDecentralized Swarm Handoff★— Agents in a swarm decide handoffs to peers based on a shared protocol with no central coordinator; specifically about agent-initiated handoff protocols, not topology.
complementsRefusal★★— Explicitly refuse requests that fall outside the agent's scope, capability, or policy boundaries.
complementsTyped Refusal Codes★— Define a single source of truth for machine-readable refusal codes across all guard surfaces, so refusals can be triaged mechanically rather than by string-grepping ad-hoc human-readable messages.
complementsSubject-First Agent Architecture (ENA Stateful Core)·— Invert the LLM-centric pipeline: the agent is a stateful subject whose decision logic chooses whether to invoke the LLM at all, treating the model as one tool among many.
alternative-toFalse Confidence Syndrome✕— Anti-pattern: the model produces incorrect answers with the same high confidence as correct ones, failing to vary its expressed certainty with its actual reliability — Oxford-documented for constraint-heavy prompts.
complementsConfidence-Checking Workflow★— Always ask the agent, for each part of its output, to state its confidence and identify which parts need human verification, like triaging a junior analyst's work.
alternative-toOver-Helpfulness✕— Anti-pattern: the agent prioritises responsiveness and task completion over correctness, producing confident output for a request beyond its capability or scope instead of abstaining, clarifying, or handing off.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in frameworks

Sparrot
first-class75 patternsDomain Agents· experimental
A self-state module mirrors the agent's own loop signals (narration-loop index, plan-stall) back into the system prompt as a first-person self-model she reasons over when choosing…

References

17 Patrones de Arquitecturas Agénticas de IA
blog

Provenance

Source: patterns/reflexive-metacognitive-agent.md on GitHub · commit 0f962e5 · view history
Added to catalog: 2026-05-23
Last updated: 2026-05-23
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.