XIII · Cognition & IntrospectionExperimental·

Reflexive Metacognitive Agent

also known as Self-Model Agent, Capability-Aware Agent

Agent maintains an explicit self-model of its own capabilities, confidence and limitations, and reasons over that model when accepting / refusing / handing off tasks.

This pattern helps complete certain larger patterns —

  • specialisesAwarenessMaintain the agent's explicit knowledge of its own tools, capabilities, environment, and current context as queryable state.

Context

A team has an agent. The default agent accepts whatever task it is given and proceeds. There is no explicit self-model — the agent does not represent 'what I am good at' or 'what I should refuse'.

Problem

Without an explicit self-model, the agent has no principled way to refuse tasks outside its competence or hand off to a more suitable peer. Refusals are ad-hoc, based on prompt-level instructions that are inconsistent across calls. Differs from confidence-reporting (which is per-output) by making the self-model an *input* to decision-making, not just an output.

Forces

  • Maintaining an explicit self-model requires upfront capability characterization.
  • Self-model drift — the agent's actual capabilities change with model updates.
  • Reasoning over a self-model adds a step to every decision.

Example

A research agent's self-model: {capabilities: [literature-search, summarization], confidence: {medical-research: 0.6, legal-research: 0.3}, limitations: [no-citation-verification]}. Asked a legal-research question, the agent consults self-model, sees 0.3 confidence, refuses-with-reason and hands off to a legal-specialist peer. Without self-model, the agent would have attempted and produced low-quality output.

Diagram

Solution

Therefore:

Self-model is a structured artifact: {capabilities: [...], confidence-by-task-class: {...}, declared-limitations: [...]}. At task acceptance, agent reasons over self-model: does this task fall in my capabilities? what's my confidence for this class? are any declared limitations triggered? Output: accept / refuse-with-reason / handoff-to-peer-with-capability-X. Self-model refreshed periodically against eval-suite results. Pair with confidence-reporting, decentralized-swarm-handoff, refusal, typed-refusal-codes.

What this pattern forbids. The agent does not accept tasks without consulting its self-model; the self-model is an explicit artifact, not implicit prompt behavior.

And the patterns that stand alongside it, or against it —

  • complementsConfidence ReportingSurface the agent's uncertainty about its answer alongside the answer itself.
  • complementsDecentralized Swarm HandoffAgents in a swarm decide handoffs to peers based on a shared protocol with no central coordinator; specifically about agent-initiated handoff protocols, not topology.
  • complementsRefusal★★Explicitly refuse requests that fall outside the agent's scope, capability, or policy boundaries.
  • complementsTyped Refusal CodesDefine a single source of truth for machine-readable refusal codes across all guard surfaces, so refusals can be triaged mechanically rather than by string-grepping ad-hoc human-readable messages.
  • complementsSubject-First Agent Architecture (ENA Stateful Core)·Invert the LLM-centric pipeline: the agent is a stateful subject whose decision logic chooses whether to invoke the LLM at all, treating the model as one tool among many.
  • alternative-toFalse Confidence SyndromeAnti-pattern: the model produces incorrect answers with the same high confidence as correct ones, failing to vary its expressed certainty with its actual reliability — Oxford-documented for constraint-heavy prompts.
  • complementsConfidence-Checking WorkflowAlways ask the agent, for each part of its output, to state its confidence and identify which parts need human verification, like triaging a junior analyst's work.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

Provenance